Page 45 - Proceedings of the 2018 ITU Kaleidoscope

P. 45

Machine learning for a 5G future

7. Compute new throughput Th and identify new 21. End
state s’ by the mapping between throughput and
states. 4.2 QoE DRIVEN VIDEO STREAMING STRATEGY
8. Identify new action a’ = softmax(Q,s) WITH FUTURE INFORMATION
//Exploration policy function.
B
A
A
9. Update Q (s,a) ← Q (s,a) + α[r+ γQ (s’,a’) – A probabilistic bandwidth prediction model along with QoE
A
Q (s,a)] // Double Sarsa optimization [17] can be used in bitrate adaptation for
10. Set s ← s’ //Update new state as the current state future media segment in streaming. In starting phase, the
11. a ←a’ //Update new action as the current action first segment with the most minimal quality is asked by the
12. Send feedback (a) to server. receiver to decrease the startup delay. For all the
13. Repeat steps 2-12 until streaming continues. accompanying portions, the adaptation technique would
14. End choose the quality level on the outcome of sub-optimization
problem. The sub-optimization process chooses the quality
4.2 EXPLORATION POLICY ALGORITHMS level which maximizes the expected score for the internal
QoE. The maximization of QoE score is based on a greedy
Softmax(Q,s) search approach considering all possible quality patterns
requested by the client.
1. Initialize τ = 1, limit = 0, tot = 0, check = 0 and an
array prb. The internal QoE score clubbed with the probability of
2. For i = 1 to prb.length bandwidth pattern is used to generate the expected QoE. A

, , normalized buffer change factor is defined to be
3. prb[i] = incorporated into the internal QoE. After making, the
4. tot = tot + prb[i] demand is communicated to the server. If there is
5. For i = 1 to prb.length starvation at client buffer, the most minimal quality level is
6. prb[i] = prb[i] / tot asked for to lessen delay. The algorithm is as follows:
7. Generate a random value rand.
8. For i = 1 to prb.length 1. Initialize the number of segments N, total number
9. If rand>limit and rand<limit + prb[i] of available quality level M, total number of
10. actionSelected = i available bandwidth state L, the transition matrix A,
11. check = 1 and the number of future segments involved in the
12. limit = limit + prb[i] decision for current segment l.
13. If check = 0 2. Select the lowest quality level for qi=Q1,
14. Repeat from step 7 bandwidth level bi = ri1 i.e., the bitrate of the
15. Else segment where i=1.
16. Return actionSelected 3. If starvation occurs, goto step 2.
17. End.
4. Select quality level which results in maximum
expected internal QoE score:
-greedy(Q,s)
l+1
maxQoE inter ( ) s.t. j ϵ {1,2,…M } (21)

5. For all the requested quality patterns {Ψ1, Ψ2,
1. Initialize fixed probability ε, max (to store
maximum value in s row of Q-matrix), Ψ3,….Ψ } calculate the expected internal QoE
th
max_action, limit = 0, check = 0 and an array prb. score.
2. For j = 1 to no_of_actions QoE( ) = ∑ Θ , ∗ (Θ ) (22)

B
A
3. If Q [s,j] + Q [s,j] >= max where,
A
B
4. max = Q [s,j] + Q [s,j] QoE inter (Θ , )=E( )-w1V( ) -

5. max_action = j w2P (Θi, )+λΔT i(Θ, ) (23)
s

6. For i= 1 to no_of_actions and P(Θ ) = x ∏ (24)
,
7. If i equals max_action ,
8. prob[i] = 1 – ε 6. Request the quality pattern with the maximum

9. Else expected internal QoE score.
10. prob[i] = ε / (no_of_actions– 1) 7. Feed the requested quality level and the actual
11. Generate a random value rand. network bandwidth state into the next round of
12. For i = 1 to prb.length decision.
13. If rand>limit and rand<limit + prb[i] 8. Calculate the average requested media quality E(ᴪ)
14. actionSelected = i using

15. check = 1 E(ᴪ)= ∑ (25)

16. limit = limit + prb[i] 9. Calculate the quality switching frequency V(ᴪ) as
17. If check = 0
18. Repeat from step 11 V(ᴪ)= | - | (26)

19. Else
20. Return actionSelected
– 29 –

40 41 42 43 44 45 46 47 48 49 50