Page 103 - Proceedings of the 2018 ITU Kaleidoscope
P. 103

Machine learning for a 5G future




           3.3  Proposed Algorithm                            move while accessing some service. Due to the complexity
                                                              of the LTE MEC environment and to the high number of
           In this paragraph, we describe the proposed deep RL approach  actions that have to be tried by the deep RL algorithm
           shown in Algorithm 1. Line 1 is dedicated to the set up of  before learning a good policy, we decided to simulate such
           the replay memory E which is a data structure containing the  an environment by using OMNeT++ [11]. The latter is a
           experiences made by the agent. It plays a very important role  well-known event-driven simulator, written in C++, which
           since it stores all the data necessary for the DNN training. At  allows us to model the network at a system level and a good
           the beginning, the two neural network weights are set to the  level of detail, while scaling efficiently with the number of
           same values (lines 2-3), then other parameters are set in lines  simulated nodes. Moreover, thanks to its large community of
           4-9. At each for-loop iteration, the agent observes the current  researchers and developers, OMNeT++ comes with a large
           state (line 11) and selects the action to perform depending on  set of pre-made and tested frameworks to simulate various
           the exploration rate   which establishes if the action has to be  portion of the network.
           chosen randomly (line 12) from the action set we previously  To model our MEC-enabled scenario, we used and integrate
           defined in Section 3.2 or if has to be returned by the main  two of these simulation frameworks: SimuLTE [12] and
           DNN (line 16). Then, after taking the action, the agent waits  INET 1. The former models two main aspects of an LTE
           for x seconds (line 19) and observes again the state reached by  network, i.e., the 4G-based radio-access network and the
           the environment and the correspondent reward (lines 20,21).  EPC. The latter, instead, implements all the relevant TCP/IP
           At this point the algorithm stores the experience made by the  protocols and layers, application and mobility models. In Fig.
           agent inside E (line 22). The core of the code is in lines  3 we represent the high-level architecture and layering of the
           24-27. First of all, the main Q network predicts the Q-values  main communicating nodes, where grayed elements are from
           for the given state s j (line 24). In particular y is an array with  SimuLTE, whereas withe ones are from INET. The UE and
           a number of elements equal to the number of possible actions  the eNB nodes are provided with an LTE NIC, which provides
           that can be executed. Then, target Q-values are evaluated  wireless connectivity through the radio-access network, and
           through the target DNN network (line 25) and used in the  implements a model of the LTE protocol stack, i.e. with
           Bellman update formula (line 26); to be more specific, only  PHY, MAC, RLC and PDCP, layers.  Each UE can be
           the Q-value related to the action sampled from the batch will  configured with multiple TCP- or UDP-based applications,
           be updated leaving the other values untouched. After the  which can communicate both with the internet or with the
           update, the main DNN network is trained by executing one  MEC server(s). Moreover, the UE includes a module to model
           training step on the cost function (line 27) and if U steps have  the mobility of the node itself. In the scenarios simulated
           been executed, the target DNN network weights are set equal  within this paper, we used a waypoint-based mobility model,
           to the main DNN network ones (line 28).            wherein UEs move linearly and at constant speed between
                                                              randomly generated waypoints. The eNB is connected with
                  4. A SIMULATION ENVIRONMENT                 the UE through the LTE NIC on one side, with the EPC
                                                              through the GTP layer on the other. The EPC has also a
           Designing and testing deep RL algorithms requires the  model of the PGW (not shown in Fig. 3), and of the MEC
           presence of an operative environment where status can  server, which includes a complete TCP/IP stack. EPC nodes
           be sensed and actions can be performed by receiving the  can be configured to have various L1/L2 types, e.g. based on
           corresponding rewards.                             PPP, ethernet, etc. Moreover, the parameters of the physical
                                                              connections among nodes, such as bandwidth, delay, etc., can
                                                              be configured to model various degrees of congestion within
                                                              the network.
                                                               UE
                                                               UE
                                                                UE               eNB              MEC Server
                                                                    Application
                                                                                      GTP           MEC Application
                                                                    UDP  TCP             UDP           UDP
                                                                 Mobility  IP     LTE NIC  IP           IP

                                                                     LTE NIC             L1/L2         L1/L2

                                                              Figure 3 – High-level view of the simulator’s architecture and
                     Figure 2 – Deep RL environment.          layering. Grayed elements are from SimuLTE, white ones are
                                                              from INET.
           The general schema of a deep RL environment is depicted
           in Fig.  2 and consists of a deep RL engine where the  With respect to the deep RL engine, we used Keras [13] an
           algorithm (composed of the two DNNs) is running and an  open source library written in Python which runs on top of
           environment that in our case is composed of a LTE MEC
           with different MEC servers where different mobile users  1 Available at "https://inet.omnetpp.org/", last accessed Jul 2018




                                                           – 87 –
   98   99   100   101   102   103   104   105   106   107   108