Page 19 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 4 – AI and machine learning solutions in 5G and future networks
P. 19
ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 4
2
challenge [18], that is the regularization is set to 0.1
+1
0 Aggregation ℎ 0
Initial path state ℎ Initial link state ℎ and 0.01 for all neural networks in the irst hidden layer
+1 and second hidden layer of the readout neural network.
ℎ
Path state update Message Link state update In the following we illustrate the impact of all changes on
neural network passing neural network the mean absolute percentage error.
+1
ℎ
ℎ
4.1 Baseline
Summation
The task was to minimize the mean absolute percentage
Readout neural
network Average per-path delay error of per‑path delays. Hence, we decided to change the
loss function in the original implementation from Mean
Fig. 2 – Schematic representation of RouteNet Squared Error (MSE) to Mean Absolute Percentage Error
All these states are combined to a sequence of path states (MAPE) to use the same metric for training and evalua‑
for each path. Therefore, the output from is aggre‑ tion.
gated for each link with a function that is denoted by . We compare this irst change with the baseline code
In RouteNet, this is equal to a summation. The func‑ where the optimization is done with respect to the mean
tion reduces the output that is returned by to the last squared error. The results are displayed in Table 1 as
state only, which is considered to be the new path state Step 0 and Step 1. It shows that without any modi ica‑
information. This algorithm or message passing between tions the model does not perform well as the average er‑
these two neural networks is repeated = 8 times. The ror over 5 runs is over 200%. This is not surprising as
number of repetitions should be of the order of the aver‑ the original RouteNet model was developed for networks
age shortest path length [2]. The inal path information is with a different scheduling policy. Using the mean ab‑
then an approximation of the ixed point of this message solute percentage error as the target function improved
passing procedure. It is then used to predict the average the model signi icantly. The grand mean of all results
delay with an additional neural network. Figure 2 gives was about 46% (with a 95% Con idence Interval (CI) of
a simpli ied overview of this message passing. Rusek et [26.5%, 66.29%]). This improvement was expected as
al. [4] give more details about the RouteNet architecture. the results were evaluated by the mean absolute percent‑
age error and the training was done with the same target
Data: path state ℎ and link state vector ℎ function.
Result: predicted per‑path delay ̂
for t = 0 to T do
+1 = (ℎ , ℎ ) 4.2 Normalization
ℎ +1 = ( ( +1 ), ℎ ) For neural networks, it is common and advised to stan‑
ℎ +1 = ( +1 ) dardize the input variables [19, 20]. Therefore, all vari‑
end ables were shifted into [0, 1] such that they are on the
̂ = (ℎ );
same scale. No centering was applied. This modi ica‑
Algorithm 1: RouteNet architecture tion signi icantly improved the results given in Table 1.
The grand mean is about 23% (95% CI [23.7%, 23.74%]).
It shows again, what is already known in the literature,
4. PROPOSED SOLUTION that normalizing or standardizing input variables is cru‑
Our proposed solution is a modi ication of RouteNet [3], cial and should be done. Not only to improve prediction
which is based on message passing and graph neural net‑ but also to improve stability of training the model, which
works. Instead of just providing the inal architecture, we is re lected in a small standard deviation of those 5 runs.
give an overview of all changes we applied to the origi‑
nal RouteNet model and provide intermediate results for 4.3 Adding variables
the delay predictions. That way, it is possible to see and
evaluate the impact that different changes had on the re‑ In Step 3 we added all variables that are provided in the
sults. All variants have been repeated 5 times to also as‑ data set from the challenge to either path state informa‑
sess the stability and variability of each model. Note that tion ℎ or link state information ℎ . When referring to
this number of 5 replications is arbitrary and no sample such variables, we will provide the names of the variables
size calculation was done to compare different variants as named in the data sets in parenthesis to make cross‑
with each other given a pre‑speci ied power for the statis‑ referencing the source code easier. As the dimension is
tical analysis. We use 600 000 training steps for each run still greater than the number of variables, all unused com‑
and an exponential decay after every 60 000 steps. That ponents of ℎ and ℎ are again initialized with 0. To be
means the learning rate of 0.001 is multiplied by the fac‑ precise, we added link capacity (bandwidth), the schedul‑
tor 0.6 after 60 000 training steps. Regularization is the ing policy (schedulingPolicy) and weights for schedul‑
same as in the RouteNet implementation provided for the ing as link information. As there are three different ToS,
© International Telecommunication Union, 2021 3