Page 203 - Kaleidoscope Academic Conference Proceedings 2021
P. 203
REINFORCEMENT LEARNING FOR SCHEDULING AND MIMO BEAM SELECTION
USING CAVIAR SIMULATIONS
1
1
1
João Paulo Tavares Borges ; Ailton Pinto de Oliveira ; Felipe Henrique Bastos e Bastos ; Daniel Takashi Né do Nascimento
3
1
1
1
2
Suzuki ; Emerson Santos de Oliveira Junior ; Lucas Matni Bezerra ; Cleverson Veloso Nahum ; Pedro dos Santos Batista ;
Aldebaro Barreto da Rocha Klautau Júnior 1
1
Universidade Federal do Pará, Belém 66075-110, Brazil
2
Universidade Estácio de Sá, Belém 66055-260, Brazil
3 Ericsson Research, 164 80 Stockholm, Sweden
ABSTRACT
This paper describes a framework for research on
Reinforcement Learning (RL) applied to scheduling and
MIMO beam selection. This framework consists of asking
the RL agent to schedule a user and then choose the index UAV
of a beamforming codebook to serve it. A key aspect of this
problem is that the simulation of the communication system
and the artificial intelligence engine is based on a virtual
world created with AirSim and the Unreal Engine. These
components enable the so-called CAVIAR methodology,
which leads to highly realistic 3D scenarios. This paper
describes the communication and RL modeling adopted in
the framework and also presents statistics concerning the
implemented RL environment, such as data traffic, as well as
results for three baseline systems. Figure 1 – CAVIAR simulation scenario, depicting the
radiation pattern (in light green) corresponding to the chosen
Keywords - 5G, 6G, beam selection, MIMO, mmWave, RL
beamforming codebook index to serve a drone (at the right).
1. INTRODUCTION Systems such as IEEE 802.11ad are usually designed for
worst-case scenarios and, in most situations, continuously
Reinforcement Learning (RL) is a learning paradigm suitable send signals that do not carry information (overhead) [9].
for problems in which an agent has to maximize a given This overhead may represent a significant p arcel o f the
reward, while interacting with an ever-changing environment. channel capacity, and decreasing it is a fundamental problem
This class of problem appears in several points of interest that can enable systems to improve the usage of physical
in 5th Generation (5G) and 6th Generation (6G) mobile resources (e.g., with lower latency and higher bit rates)
networks, such as: congestion control [1], network slicing [2], [10, 11, 12].
resource allocation [3], and the 5G Physical Layer (PHY) [4].
However, the lack of freely available data sets or environments In this work, the beam selection and user scheduling
to train and assess RL agents is a practical obstacle that delays problems are posed as a game that must be solved with
the widespread adoption of RL in 5G and future networks. RL. The game is based on a simulation methodology
named Communication Networks, Artificial Intelligence and
To address this challenge, some works explore the use of Computer Vision with 3D Computer-Generated Imagery
virtual worlds to generate data sets by creating environments (CAVIAR), with a preliminary version proposed in [13].
for communications in general [5], and Artificial Intelligence The CAVIAR simulation integrates three subsystems: the
(AI) / Machine Learning (ML) applied to 5G/6G [6], communication system, the AI and ML models, and finally
leveraging the fact that 5G and beyond systems will benefit the virtual world components. In this paper, the problem is
from rich contextual information to improve performance based on simulating a communication system immersed in
and reduce loss of radio resources to support its services a virtual world created with AirSim [14] and Unreal Engine [15].
[4, 7, 8]. So, the key idea in this paper is to use realistic
representations of deployment sites together with physics More specifically, the goal is to schedule and allocate
and sensor simulations, to generate a virtual representation resources to Unmanned Aerial Vehicles (UAVs), cars and
that combined with the communication network simulator, pedestrians, composing a scenario with aerial and terrestrial
enables training RL agents for tasks such as beam selection.
978-92-61-33881-7/CFP2168P @ ITU 2021 – 141 – Kaleidoscope