Page 208 - Kaleidoscope Academic Conference Proceedings 2021
P. 208

2021 ITU Kaleidoscope Academic Conference




                                                                          ACKNOWLEDGEMENTS

                                                              This work was supported in part by the Innovation Center,
                                                              Ericsson Telecomunicacões S.A., Brazil, CNPq and the
             Reward                                           Capes Foundation.

                                                                               REFERENCES

                                                               [1] I. Nascimento, R. Souza, S. Lins, A. Silva, and
                                                                  A. Klautau, “Deep Reinforcement Learning Applied
                                                                  to Congestion Control in Fronthaul Networks,”
                                                                  Proceedings - 2019 IEEE Latin-American Conference
                                                                  on Communications, LATINCOM 2019, pp. 1–6, 2019.
           Figure 8 – Reward obtained by the B-BeamOracle agent for
           a given episode. The traffic load switches every 1000 time  [2] Y. Kim and H. Lim, “Multi-agent reinforcement
           steps between “heavy” and “light”.                     learning-based resource management for end-to-end
                                                                  network  slicing,”  IEEE  Access,  vol.  9,  pp.
           samples, between the “heavy” and the “light” data traffic.
                                                                  56 178–56 190, 2021.
           The sequential scheduling proves to be sufficient to attend
           the demand in light traffic situations, however, for intense  [3] X. Wang and T. Zhang, “Reinforcement Learning
                                                   ˆ
           traffic moments, even using the best beam index i, without
                                                                  Based Resource Allocation for Network Slicing in 5G
           proper scheduling, the performance of the reward tends to be
                                                                  C-RAN,” in 2019 Computing, Communications and IoT
           negative.
                                                                  Applications (ComComAp), 2019, pp. 106–111.
                                                               [4] A. Klautau, P. Batista, N. González-Prelcic, Y. Wang,
                                                                  and R. W. Heath, “5G MIMO data for machine learning:
                                                                  Application to beam-selection using deep learning,” in
                                                                  2018 Information Theory and Applications Workshop
                                                                  (ITA).  IEEE, 2018, pp. 1–9.
                                                               [5] E. Egea-Lopez, F. Losilla, J. Pascual-Garcia, and J. M.
                                                                  Molina-Garcia-Pardo, “Vehicular networks simulation
                                                                  with realistic physics,” IEEE Access, vol. 7, pp.
                                                                  44 021–44 036, 2019.
                                                               [6] A. Klautau, A. de Oliveira, I. P. Trindade, and
           Figure 9 – Histogram of the total sum of rewards achieved in  W. Alves, “Generating MIMO Channels For 6G Virtual
           the test episodes.                                     Worlds Using Ray-tracing Simulations,” arXiv preprint
                                                                  arXiv:2106.05377, 2021.
           Figure 9 shows a reward histogram for different agents along
           20 test episodes. As expected, the B-BeamOracle presents  [7] A. Klautau, N. González-Prelcic, and R. W. Heath,
           the best performance, while the B-RL achieves performance  “LIDAR data for deep learning-based mmWave
           close to the B-Dummy, which simply uses random actions.  beam-selection,”  IEEE Wireless Communications
           One reason for the bad performance of B-RL is the choice  Letters, vol. 8, no. 3, pp. 909–912, 2019.
           of its input parameters. None of the seven features help
           the agent to directly learn the user and beam index used  [8] W. Jiang, B. Han, M. A. Habibi, and H. D. Schotten,
           in its previous decision. Better modeling of the agent can  “The road towards 6G: A comprehensive survey,” IEEE
           substantially improve its performance.                 Open Journal of the Communications Society, vol. 2,
                                                                  pp. 334–366, 2021.
                           6.  CONCLUSION                      [9] M. Giordani, M. Mezzavilla, C. N. Barati, S. Rangan,
                                                                  and M. Zorzi, “Comparative analysis of initial access
           This paper presented a framework for research on RL    techniques in 5G mmWave cellular networks,” in 2016
           applied to scheduling and MIMO beam selection. Using   Annual Conference on Information Science and Systems
           the framework, we provided statistics of an experiment in  (CISS).  IEEE, 2016, pp. 268–273.
           which an RL agent faces the problems of user scheduling and
           beam selection. The experiment allowed us to validate the  [10] J. Choi, V. Va, N. Gonzalez-Prelcic, R. Daniels, C. R.
           designed environment for RL training and testing. Future  Bhat, and R. W. Heath, “Millimeter-wave vehicular
           development will focus on rendering the 3D scenarios while  communication  to  support  massive  automotive
           training the RL agent, as well as using more realistic channels  sensing,” IEEE Communications Magazine, vol. 54,
           via ray tracing.                                       no. 12, pp. 160–167, 2016.




                                                          – 146 –
   203   204   205   206   207   208   209   210   211   212   213