Page 145 - Proceedings of the 2017 ITU Kaleidoscope
P. 145
Challenges for a data-driven society
11. If selectedAction =-1 , ← (1− ) ,
12. Generate a random number r, in range of + + ( , ) (15)
action. 15. Estimate the bandwidth bwk//Testing Phase begins
13. selectedAction = r 16. Assign sk= { bwk , bufk, qk-1 }
14. Return selectedAction 17. ak= maxa (Q(sk, a))
18. Send ak as feedback to the server.
4.3. Q-LEARNING BASED QUALITY ADAPTATION 19. Repeat from Step 15 until streaming occurs.
(QBQA)
Q-Learning is a model free reinforcement learning 5. IMPLEMENTATION ENVIRONMENT
algorithm. The QBQA is based on [13], where the authors
have designed and optimized a Q-Learning approach for The Java programming environment based on 64bit JDK
video quality adaptation. The system state (sk) was modeled Version 7 was chosen for implementation purpose andthe
with Bandwidth (bwk), Buffer occupancy level (bufk), and code was developed using Eclipse IDE. The 64 bit VLC
quality level (qk-1) of the segment. The action (ak) of the media player was used for playing the media,as VLC can be
system is based on different qualities of video segment easily manipulated using java with the help of VLCJ
which is expressed using nominal bit rate. The reward is framework. Dshow API [18] was used for capturing live
formulated for the action taken by considering three factors video for streaming, but for packet capturingJnetpcap [19]
which are quality affected by bandwidth and buffer, video framework was used.The client and server were connected
freeze, and quality switching. The exploration policy used through4G Mobile Hotspot devices in a typical cellular
for action selection is value based differential Softmax. The wireless network. Frame rates were varied with values 20,
adaptation algorithm based on Q-Learning [13]is as 24, 27, 30 while default rate was chosen to be 24. Standard
follows. video resolutions like QCIF(176*144),
CIF(352*288),VGA(640*480), SQCIF (128*96) and
QBQA Algorithm QVGA (320*240) were used dynamically at encoding /
decoding process during the experiment.The server and
1. Initialize the learning rate α, discount factor , Q- client were implemented in Windows 10 (64 bit operating
matrix,and optimal bandwidth value . system) Core i3 processor with 8GB RAM and Windows
2. Read the current buffer occupancy level bufkfor k th 10, 64bit OS, Core i5 processor with 4GB RAM
segment and quality level qk-1 for segment k-1 while respectively.The streaming was implemented on top of the
streaming HTTP in a typical internet environment.
3. For i = 1 to t//Training Phase The network bit rate carrying capacity of the
4. Estimate the bandwidth bwk AirtelMobileHotspot (4G-LTE TD)[20]dongle was
5. Assign sk= { bwk , bufk, qk-1 } // Current State analyzed using online tool Speedof.me [21] and one
6. ak= Softmax(Q, sk ) // Exploration policy function instance result is shown in Figure 2. Internet speed of
to get best possible action. wireless connection was measured without using FLASH or
7. Calculate the quality factor related to bandwidth java which is currently used by many other speed test
and buffer occupancy level using the equation websites.The online tool provided a broadband speed test
( / ) service which uses pure browser capabilities
= −1.5 . . ( / ) − (11) suchas HTML5 and JavaScript. For the reliability of
8. Calculate the quality factor related to switch in measured data, it utilizes multiple test servers around the
quality using the equation world and the server is chosen automatically. Both
= −| − | (12) download and upload speed of the network device is
9. Read the duration of video freeze , time observed independently.
elapsed from the last freeze and number of
freezes n
10. Calculate the quality factor related to video
freezing using the equation
−100 . . =
= (13)
−100 . . ≠
11. Calculate
= + + (14)
12. End
13. Determine the resultant state, sk +1 using{ bwk+1 , bufk+1, Fig. 2. Bitrate observed during stream of live video
qk} using Airtel 4G LTE TD Hotspot
14. Update the Q-matrix using
– 129 –