Page 145 - Proceedings of the 2017 ITU Kaleidoscope
P. 145

Challenges for a data-driven society




               11. If selectedAction =-1                          ,    ← (1−  )     ,




               12.   Generate a random  number  r, in range of               +         +        (    , )  (15)



                     action.                                  15. Estimate the bandwidth bwk//Testing Phase begins
               13.   selectedAction = r                       16. Assign sk= { bwk , bufk, qk-1 }
               14. Return selectedAction                      17. ak= maxa (Q(sk, a))
                                                              18. Send ak as feedback to the server.
           4.3. Q-LEARNING BASED QUALITY ADAPTATION           19. Repeat from Step 15 until streaming occurs.
           (QBQA)
           Q-Learning is a  model  free reinforcement learning      5. IMPLEMENTATION ENVIRONMENT
           algorithm. The QBQA is based on [13], where the authors
           have designed and optimized a Q-Learning approach  for   The Java programming environment based on 64bit JDK
           video quality adaptation. The system state (sk) was modeled   Version 7  was chosen for implementation purpose andthe
           with Bandwidth (bwk), Buffer occupancy level (bufk), and   code  was developed using Eclipse IDE. The 64 bit VLC
           quality level (qk-1) of the segment. The action (ak) of the   media player was used for playing the media,as VLC can be
           system is based on different qualities of video segment   easily  manipulated using java  with the help of VLCJ
           which is expressed using  nominal bit rate. The reward is   framework.  Dshow API  [18] was used for  capturing  live
           formulated for the action taken by considering three factors   video for streaming, but for packet capturingJnetpcap [19]
           which are quality affected by bandwidth and buffer, video   framework was used.The client and server were connected
           freeze, and quality switching. The exploration policy used   through4G Mobile Hotspot  devices in a typical cellular
           for action selection is value based differential Softmax. The   wireless network. Frame rates were varied with values 20,
           adaptation algorithm based on Q-Learning [13]is as   24, 27, 30 while default rate was chosen to be 24. Standard
           follows.                                           video      resolutions    like     QCIF(176*144),
                                                              CIF(352*288),VGA(640*480), SQCIF (128*96) and
           QBQA Algorithm                                     QVGA (320*240) were used dynamically at encoding /
                                                              decoding process during the experiment.The  server and
           1.  Initialize the learning rate  α, discount factor , Q-  client were implemented in Windows 10 (64 bit operating
               matrix,and optimal bandwidth value      .      system) Core i3 processor  with 8GB  RAM and Windows
           2.  Read the current buffer occupancy  level  bufkfor k th  10, 64bit OS, Core i5  processor  with 4GB RAM
               segment and quality level  qk-1 for segment  k-1 while  respectively.The streaming was implemented on top of the
               streaming                                      HTTP in a typical internet environment.
           3.  For i = 1 to t//Training Phase                 The network bit rate  carrying capacity of the
           4.     Estimate the bandwidth bwk                  AirtelMobileHotspot  (4G-LTE  TD)[20]dongle  was
           5.     Assign sk= { bwk , bufk, qk-1 } // Current State  analyzed  using online tool  Speedof.me [21] and one
           6.     ak= Softmax(Q, sk ) // Exploration policy function  instance result is shown  in Figure 2. Internet  speed of
                  to get best possible  action.               wireless connection was measured without using FLASH or
           7.     Calculate the quality  factor related to bandwidth  java  which is currently used by  many other speed test
                  and buffer occupancy level using the equation  websites.The online tool provided a broadband speed test
                                    (      /      )           service   which     uses    pure  browser  capabilities
                          = −1.5 .     .    (     /    )  −               (11)   suchas HTML5 and JavaScript. For the reliability of


           8.     Calculate the quality  factor related to switch in  measured data, it utilizes  multiple test servers around the
                  quality using the equation                  world and the server is  chosen automatically. Both
                               = −|       −  |                              (12)  download and upload speed of the  network device  is

           9.     Read the duration of  video freeze         , time  observed independently.
                  elapsed from the last freeze         and number of
                  freezes n
           10.    Calculate the quality  factor related to video
                  freezing using the equation

                        −100 .          .            =

                       =                                (13)


                       −100 .          .      ≠


           11. Calculate
                        =           +            +           (14)
           12. End
           13. Determine the resultant state, sk +1 using{ bwk+1 , bufk+1,  Fig. 2. Bitrate observed during stream of live video
               qk}                                                       using Airtel 4G LTE TD Hotspot
           14. Update the Q-matrix using





                                                          – 129 –
   140   141   142   143   144   145   146   147   148   149   150