Page 207 - Kaleidoscope Academic Conference Proceedings 2021
P. 207

Connecting physical and virtual worlds




           of the scheduled user are transmitted, while the buffers of the  4.3  Experiment description
           other two users were empty.
                                                              We developed an experiment using CAVIAR for the problem
                                                              of scheduling and beam selection. Given that a complete
           4.2  Possible inputs to RL agents                  episode file contains information about all moving objects in
                                                              a scene (all pedestrians, cars, etc.), we simplified the data
           The inputs (also known as states or observations) can  generated by the simulation assuming that the beam selection
           be selected both from information provided in CSV files  RL agent, named B-RL, only uses data from the three served
           (position (x,y,z), velocities, etc.)  or obtained from the  users (uav1, simulation_car1 and simulation_pedestrian1).
           environment, such as the buffer state and channel information
           for specific beam index previously chosen.

           More specifically, the RL agent can use: the UEs geographic
           position in X, Y, Z, with the origin of the coordinate system
           being on the BS RL agent. Also, the UEs orientation in the
           three rotation coordinates: the front and side roll angles, as
           well as its rotation over its own axis. Besides that, there are
           also dropped, transmitted and buffered packets. Finally, the
           last two other available input features for the agent are the bit
           rate and the channel magnitude at each step of the simulation.

           Note that we assume the BS (more specifically, the RL agent)
                                  ˆ
           does not know the best index i. In practice, this would require
           a full beam sweep, which is assumed to be unfeasible in our
                                                              Figure 7 – Channel maximum throughput when using always
           model due to stringent time requirements. Similarly, given          ˆ
                                                              the best beam index i and a simple scheduling strategy that
           that the RL agent chose user u and beam index j at time t,
                                                              chooses users sequentially (1-2-3-1-2-3...), in a round-robin
           it learns only the magnitude |y j | and the spectral efficiency
                                                              fashion.
           S u,t, j for this specific user and beam index at time t.
                                                              The following results are extracted from an Advantage Actor
           The channel throughput T u,t,j = S u,t,j BW is obtained by  Critic (A2C) agent from the Stable-Baselines [18], trained
           multiplying the spectral efficiency by the bandwidth BW and  with default parameters. The states of the agent are defined by
           indicates the maximum number of bits that can be transmitted.  seven features: X, Y, Z, packets dropped, packets transmitted,
           An empirical factor is used to adjust T u,t,j in order to define  buffered packets and bit rate. The action space is composed
           the network load, such that, for the given input traffic, some  by a vector with two integers: a numeric identity of the
           packets have to be dropped.                        user being allocated at the specific timestamp, that can range
                                                              between [0, 2]; and the codebook index of the beam to be
           Algorithm 1 summarizes the steps for executing an already  used to serve it, which is an integer from the range [0, 63].
           trained RL agent.                                  Finally, the reward used Eq. (5).

            Algorithm 1: High-level algorithm of the RL-based
                                                              Because the RL agent was designed to play the role of a simple
            scheduling and beam selection problem.
                                                              example and not optimize performance, two other agents were
             Initialization for a given episode e;            developed: B-Dummy and B-BeamOracle. The B-Dummy
                       e
             while t ≤ N do                                   agent assumes random action choices for both the scheduled
                       s
                1) Based on the number of bits in the buffers of  user and which beam index to use. The B-BeamOracle agent
                 the users and other input information, RL agent  follows a sequential user scheduling pattern (1-2-3-1-2-3,...)
                 schedules user u and selects beam index i;   in a round-robin fashion, and always uses the optimum beam
                2) Environment calculates combined channel    index i for the selected user. In Figure 7 we characterize the
                                                                   ˆ
                 magnitude |y i | and corresponding throughput T i ;  channel maximum throughput of this experiment when using
                3) The number of transmitted bits is          B-BeamOracle.
                 R i = min(T u,t, j ; b u ) ;
                4) Update buffers;
                5) Receive new packets;                                  5.  EXPERIMENT RESULTS
                6) Eventually drops packets;
                7) Environment calculates rewards and updates its  The CAVIAR environment was used to generate 70 episodes,
                 state;                                       from which 50 were used for training the RL agent, and 20 for
                8) Update buffers again;                       testing. We present results for the three agents: B-Dummy,
             end                                              B-BeamOracle and the RL-based A2C agent.

                                                              In Figure 8, it is possible to verify the switching at every 1000




                                                          – 145 –
   202   203   204   205   206   207   208   209   210   211   212