Page 103 - Proceedings of the 2018 ITU Kaleidoscope

P. 103

Machine learning for a 5G future

3.3 Proposed Algorithm move while accessing some service. Due to the complexity
of the LTE MEC environment and to the high number of
In this paragraph, we describe the proposed deep RL approach actions that have to be tried by the deep RL algorithm
shown in Algorithm 1. Line 1 is dedicated to the set up of before learning a good policy, we decided to simulate such
the replay memory E which is a data structure containing the an environment by using OMNeT++ [11]. The latter is a
experiences made by the agent. It plays a very important role well-known event-driven simulator, written in C++, which
since it stores all the data necessary for the DNN training. At allows us to model the network at a system level and a good
the beginning, the two neural network weights are set to the level of detail, while scaling eﬃciently with the number of
same values (lines 2-3), then other parameters are set in lines simulated nodes. Moreover, thanks to its large community of
4-9. At each for-loop iteration, the agent observes the current researchers and developers, OMNeT++ comes with a large
state (line 11) and selects the action to perform depending on set of pre-made and tested frameworks to simulate various
the exploration rate which establishes if the action has to be portion of the network.
chosen randomly (line 12) from the action set we previously To model our MEC-enabled scenario, we used and integrate
deﬁned in Section 3.2 or if has to be returned by the main two of these simulation frameworks: SimuLTE [12] and
DNN (line 16). Then, after taking the action, the agent waits INET 1. The former models two main aspects of an LTE
for x seconds (line 19) and observes again the state reached by network, i.e., the 4G-based radio-access network and the
the environment and the correspondent reward (lines 20,21). EPC. The latter, instead, implements all the relevant TCP/IP
At this point the algorithm stores the experience made by the protocols and layers, application and mobility models. In Fig.
agent inside E (line 22). The core of the code is in lines 3 we represent the high-level architecture and layering of the
24-27. First of all, the main Q network predicts the Q-values main communicating nodes, where grayed elements are from
for the given state s j (line 24). In particular y is an array with SimuLTE, whereas withe ones are from INET. The UE and
a number of elements equal to the number of possible actions the eNB nodes are provided with an LTE NIC, which provides
that can be executed. Then, target Q-values are evaluated wireless connectivity through the radio-access network, and
through the target DNN network (line 25) and used in the implements a model of the LTE protocol stack, i.e. with
Bellman update formula (line 26); to be more speciﬁc, only PHY, MAC, RLC and PDCP, layers. Each UE can be
the Q-value related to the action sampled from the batch will conﬁgured with multiple TCP- or UDP-based applications,
be updated leaving the other values untouched. After the which can communicate both with the internet or with the
update, the main DNN network is trained by executing one MEC server(s). Moreover, the UE includes a module to model
training step on the cost function (line 27) and if U steps have the mobility of the node itself. In the scenarios simulated
been executed, the target DNN network weights are set equal within this paper, we used a waypoint-based mobility model,
to the main DNN network ones (line 28). wherein UEs move linearly and at constant speed between
randomly generated waypoints. The eNB is connected with
4. A SIMULATION ENVIRONMENT the UE through the LTE NIC on one side, with the EPC
through the GTP layer on the other. The EPC has also a
Designing and testing deep RL algorithms requires the model of the PGW (not shown in Fig. 3), and of the MEC
presence of an operative environment where status can server, which includes a complete TCP/IP stack. EPC nodes
be sensed and actions can be performed by receiving the can be conﬁgured to have various L1/L2 types, e.g. based on
corresponding rewards. PPP, ethernet, etc. Moreover, the parameters of the physical
connections among nodes, such as bandwidth, delay, etc., can
be conﬁgured to model various degrees of congestion within
the network.
UE
UE
UE eNB MEC Server
Application
GTP MEC Application
UDP TCP UDP UDP
Mobility IP LTE NIC IP IP

LTE NIC L1/L2 L1/L2

Figure 3 – High-level view of the simulator’s architecture and
Figure 2 – Deep RL environment. layering. Grayed elements are from SimuLTE, white ones are
from INET.
The general schema of a deep RL environment is depicted
in Fig. 2 and consists of a deep RL engine where the With respect to the deep RL engine, we used Keras [13] an
algorithm (composed of the two DNNs) is running and an open source library written in Python which runs on top of
environment that in our case is composed of a LTE MEC
with diﬀerent MEC servers where diﬀerent mobile users 1 Available at "https://inet.omnetpp.org/", last accessed Jul 2018

– 87 –

98 99 100 101 102 103 104 105 106 107 108