Page 45 - Proceedings of the 2018 ITU Kaleidoscope
P. 45

Machine learning for a 5G future




               7.  Compute new throughput  Th and identify new    21. End
                  state  s’ by the  mapping between throughput and
                  states.                                     4.2   QoE DRIVEN VIDEO STREAMING STRATEGY
               8.  Identify  new  action  a’  =  softmax(Q,s)     WITH FUTURE INFORMATION
                  //Exploration policy function.
                                                    B
                          A
                                     A
               9.  Update Q  (s,a)  ← Q  (s,a) +  α[r+  γQ (s’,a’) –  A probabilistic bandwidth prediction model along with QoE
                    A
                  Q (s,a)] // Double Sarsa                    optimization [17] can be used in bitrate adaptation for
               10. Set s ← s’ //Update new state as the current state  future  media segment in streaming. In starting phase, the
               11. a ←a’ //Update new action as the current action  first segment with the most minimal quality is asked by the
               12. Send feedback (a) to server.               receiver to decrease the  startup delay. For all the
               13. Repeat steps 2-12 until streaming continues.  accompanying portions,  the adaptation technique  would
               14. End                                        choose the quality level on the outcome of sub-optimization
                                                              problem. The sub-optimization process chooses the quality
           4.2    EXPLORATION POLICY ALGORITHMS               level which maximizes the expected score for the internal
                                                              QoE. The maximization of QoE score is based on a greedy
           Softmax(Q,s)                                       search approach considering all possible quality patterns
                                                              requested by the client.
               1.  Initialize τ = 1, limit = 0, tot = 0, check = 0 and an
                  array prb.                                  The internal QoE score clubbed with the probability of
               2.  For i = 1 to prb.length                    bandwidth pattern is used to generate the expected QoE. A


                                      ,       ,               normalized buffer change factor is defined to be
               3.        prb[i] =                             incorporated into the internal QoE. After  making, the
               4.        tot = tot + prb[i]                   demand is communicated to the server.  If there is
               5.  For i = 1 to prb.length                    starvation at client buffer, the most minimal quality level is
               6.        prb[i] = prb[i] / tot                asked for to lessen delay. The algorithm is as follows:
               7.  Generate a random value rand.
               8.  For i = 1 to prb.length                        1.  Initialize the number of segments N, total number
               9.        If  rand>limit and rand<limit + prb[i]      of available quality level  M, total number of
               10.               actionSelected = i                  available bandwidth state L, the transition matrix A,
               11.               check = 1                           and the number of future segments involved in the
               12.       limit = limit + prb[i]                      decision for current segment l.
               13. If check = 0                                   2.  Select the lowest quality level for  qi=Q1,
               14.       Repeat from step 7                          bandwidth level  bi =  ri1  i.e., the bitrate of the
               15. Else                                              segment where i=1.
               16.       Return  actionSelected                   3.  If starvation occurs, goto step 2.
               17. End.
                                                                  4.  Select quality  level  which results in  maximum
                                                                     expected internal QoE score:
            -greedy(Q,s)
                                                                                                l+1
                                                                     maxQoE inter  (  ) s.t. j ϵ {1,2,…M  }       (21)

                                                                  5.  For all the requested quality patterns {Ψ1,  Ψ2,
               1.  Initialize fixed  probability  ε,  max (to store
                  maximum value in  s  row of Q-matrix),             Ψ3,….Ψ       } calculate the expected internal QoE
                                       th
                  max_action, limit = 0, check = 0 and an array prb.  score.
               2.  For j = 1 to no_of_actions                        QoE(  ) = ∑                   Θ  ,    ∗ (Θ  )     (22)




                                    B
                             A
               3.        If  Q [s,j] + Q [s,j] >= max                where,
                                       A
                                               B
               4.                max = Q [s,j] + Q [s,j]             QoE inter (Θ ,  )=E(  )-w1V(  ) -




               5.                max_action = j                      w2P (Θi,  )+λΔT i(Θ,  ) (23)
                                                                        s

               6.  For i= 1 to no_of_actions                         and P(Θ ) =       x ∏                          (24)
                                                                                                ,
               7.        If  i equals max_action                                      ,
               8.                prob[i] = 1 – ε                  6.  Request the quality pattern    with the maximum

               9.        Else                                        expected internal QoE score.
               10.               prob[i] = ε / (no_of_actions– 1)  7.  Feed the requested quality  level and the actual
               11. Generate a random value rand.                     network bandwidth state into the next round of
               12. For i = 1 to prb.length                           decision.
               13.       If  rand>limit and rand<limit + prb[i]   8.  Calculate the average requested media quality E(ᴪ)
               14.               actionSelected = i                  using

               15.               check = 1                           E(ᴪ)= ∑                                               (25)



               16.       limit = limit + prb[i]                   9.  Calculate the quality switching frequency V(ᴪ) as
               17. If check = 0
               18.       Repeat from step 11                         V(ᴪ)=         |       -   |            (26)

               19. Else
               20.       Return  actionSelected
                                                          – 29 –
   40   41   42   43   44   45   46   47   48   49   50