Page 19 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 4 – AI and machine learning solutions in 5G and future networks
P. 19

ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 4




                                                                                       2
                                                               challenge [18], that is the    regularization is set to 0.1
                                           +1
                  0           Aggregation  ℎ         0
          Initial path state ℎ               Initial link state ℎ     and 0.01 for all neural networks in the  irst hidden layer
                                       +1                      and second hidden layer of the readout neural network.
                                    ℎ   
                   Path state update   Message   Link state update   In the following we illustrate the impact of all changes on
                    neural network  passing  neural network    the mean absolute percentage error.
                              +1
                            ℎ   
                    
                  ℎ   
                                                               4.1 Baseline
                              Summation
                                                               The task was to minimize the mean absolute percentage
                    Readout neural
                     network       Average per-path delay      error of per‑path delays. Hence, we decided to change the
                                                               loss function in the original implementation from Mean
                  Fig. 2 – Schematic representation of RouteNet  Squared Error (MSE) to Mean Absolute Percentage Error
          All these states are combined to a sequence of path states  (MAPE) to use the same metric for training and evalua‑
          for each path. Therefore, the output from    is aggre‑  tion.
                                                   
          gated for each link with a function that is denoted by   .  We compare this  irst change with the baseline code
          In RouteNet, this    is equal to a summation. The func‑  where the optimization is done with respect to the mean
          tion    reduces the output that is returned by    to the last  squared error. The results are displayed in Table 1 as
                                                  
          state only, which is considered to be the new path state  Step 0 and Step 1. It shows that without any modi ica‑
          information. This algorithm or message passing between  tions the model does not perform well as the average er‑
          these two neural networks is repeated    = 8 times. The  ror over 5 runs is over 200%. This is not surprising as
          number of repetitions should be of the order of the aver‑  the original RouteNet model was developed for networks
          age shortest path length [2]. The  inal path information is  with a different scheduling policy. Using the mean ab‑
          then an approximation of the  ixed point of this message  solute percentage error as the target function improved
          passing procedure. It is then used to predict the average  the model signi icantly. The grand mean of all results
          delay with an additional neural network. Figure 2 gives  was about 46% (with a 95% Con idence Interval (CI) of
          a simpli ied overview of this message passing. Rusek et  [26.5%, 66.29%]). This improvement was expected as
          al. [4] give more details about the RouteNet architecture.  the results were evaluated by the mean absolute percent‑
                                                               age error and the training was done with the same target
           Data: path state ℎ and link state vector ℎ          function.
                             
           Result: predicted per‑path delay ̂     
           for t = 0 to T do
                               
                            
                       +1  =    (ℎ , ℎ )                       4.2 Normalization
                            
                         
                               
                                   
               ℎ   +1  =    (  (        +1 ), ℎ )              For neural networks, it is common and advised to stan‑
                                   
                         
                  
               ℎ   +1  =   (        +1 )                       dardize the input variables [19, 20]. Therefore, all vari‑
                  
           end                                                 ables were shifted into [0, 1] such that they are on the
                      
             ̂    =   (ℎ );
                                                               same scale. No centering was applied. This modi ica‑
                 Algorithm 1: RouteNet architecture            tion signi icantly improved the results given in Table 1.
                                                               The grand mean is about 23% (95% CI [23.7%, 23.74%]).
                                                               It shows again, what is already known in the literature,
          4.  PROPOSED SOLUTION                                that normalizing or standardizing input variables is cru‑
          Our proposed solution is a modi ication of RouteNet [3],  cial and should be done. Not only to improve prediction
          which is based on message passing and graph neural net‑  but also to improve stability of training the model, which
          works. Instead of just providing the  inal architecture, we  is re lected in a small standard deviation of those 5 runs.
          give an overview of all changes we applied to the origi‑
          nal RouteNet model and provide intermediate results for  4.3 Adding variables
          the delay predictions. That way, it is possible to see and
          evaluate the impact that different changes had on the re‑  In Step 3 we added all variables that are provided in the
          sults. All variants have been repeated 5 times to also as‑  data set from the challenge to either path state informa‑
          sess the stability and variability of each model. Note that  tion ℎ or link state information ℎ . When referring to
                                                                       
                                                                                                
          this number of 5 replications is arbitrary and no sample  such variables, we will provide the names of the variables
          size calculation was done to compare different variants  as named in the data sets in parenthesis to make cross‑
          with each other given a pre‑speci ied power for the statis‑  referencing the source code easier. As the dimension is
          tical analysis. We use 600 000 training steps for each run  still greater than the number of variables, all unused com‑
          and an exponential decay after every 60 000 steps. That  ponents of ℎ and ℎ are again initialized with 0. To be
                                                                            
                                                                                   
          means the learning rate of 0.001 is multiplied by the fac‑  precise, we added link capacity (bandwidth), the schedul‑
          tor 0.6 after 60 000 training steps. Regularization is the  ing policy (schedulingPolicy) and weights for schedul‑
          same as in the RouteNet implementation provided for the  ing as link information. As there are three different ToS,


                                             © International Telecommunication Union, 2021                     3
   14   15   16   17   18   19   20   21   22   23   24