Page 208 - Kaleidoscope Academic Conference Proceedings 2021

P. 208

2021 ITU Kaleidoscope Academic Conference

ACKNOWLEDGEMENTS

This work was supported in part by the Innovation Center,
Ericsson Telecomunicacões S.A., Brazil, CNPq and the
Reward Capes Foundation.

REFERENCES

[1] I. Nascimento, R. Souza, S. Lins, A. Silva, and
A. Klautau, “Deep Reinforcement Learning Applied
to Congestion Control in Fronthaul Networks,”
Proceedings - 2019 IEEE Latin-American Conference
on Communications, LATINCOM 2019, pp. 1–6, 2019.
Figure 8 – Reward obtained by the B-BeamOracle agent for
a given episode. The traﬃc load switches every 1000 time [2] Y. Kim and H. Lim, “Multi-agent reinforcement
steps between “heavy” and “light”. learning-based resource management for end-to-end
network slicing,” IEEE Access, vol. 9, pp.
samples, between the “heavy” and the “light” data traﬃc.
56 178–56 190, 2021.
The sequential scheduling proves to be suﬃcient to attend
the demand in light traﬃc situations, however, for intense [3] X. Wang and T. Zhang, “Reinforcement Learning
ˆ
traﬃc moments, even using the best beam index i, without
Based Resource Allocation for Network Slicing in 5G
proper scheduling, the performance of the reward tends to be
C-RAN,” in 2019 Computing, Communications and IoT
negative.
Applications (ComComAp), 2019, pp. 106–111.
[4] A. Klautau, P. Batista, N. González-Prelcic, Y. Wang,
and R. W. Heath, “5G MIMO data for machine learning:
Application to beam-selection using deep learning,” in
2018 Information Theory and Applications Workshop
(ITA). IEEE, 2018, pp. 1–9.
[5] E. Egea-Lopez, F. Losilla, J. Pascual-Garcia, and J. M.
Molina-Garcia-Pardo, “Vehicular networks simulation
with realistic physics,” IEEE Access, vol. 7, pp.
44 021–44 036, 2019.
[6] A. Klautau, A. de Oliveira, I. P. Trindade, and
Figure 9 – Histogram of the total sum of rewards achieved in W. Alves, “Generating MIMO Channels For 6G Virtual
the test episodes. Worlds Using Ray-tracing Simulations,” arXiv preprint
arXiv:2106.05377, 2021.
Figure 9 shows a reward histogram for diﬀerent agents along
20 test episodes. As expected, the B-BeamOracle presents [7] A. Klautau, N. González-Prelcic, and R. W. Heath,
the best performance, while the B-RL achieves performance “LIDAR data for deep learning-based mmWave
close to the B-Dummy, which simply uses random actions. beam-selection,” IEEE Wireless Communications
One reason for the bad performance of B-RL is the choice Letters, vol. 8, no. 3, pp. 909–912, 2019.
of its input parameters. None of the seven features help
the agent to directly learn the user and beam index used [8] W. Jiang, B. Han, M. A. Habibi, and H. D. Schotten,
in its previous decision. Better modeling of the agent can “The road towards 6G: A comprehensive survey,” IEEE
substantially improve its performance. Open Journal of the Communications Society, vol. 2,
pp. 334–366, 2021.
6. CONCLUSION [9] M. Giordani, M. Mezzavilla, C. N. Barati, S. Rangan,
and M. Zorzi, “Comparative analysis of initial access
This paper presented a framework for research on RL techniques in 5G mmWave cellular networks,” in 2016
applied to scheduling and MIMO beam selection. Using Annual Conference on Information Science and Systems
the framework, we provided statistics of an experiment in (CISS). IEEE, 2016, pp. 268–273.
which an RL agent faces the problems of user scheduling and
beam selection. The experiment allowed us to validate the [10] J. Choi, V. Va, N. Gonzalez-Prelcic, R. Daniels, C. R.
designed environment for RL training and testing. Future Bhat, and R. W. Heath, “Millimeter-wave vehicular
development will focus on rendering the 3D scenarios while communication to support massive automotive
training the RL agent, as well as using more realistic channels sensing,” IEEE Communications Magazine, vol. 54,
via ray tracing. no. 12, pp. 160–167, 2016.

– 146 –

203 204 205 206 207 208 209 210 211 212 213