Page 123 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 4 – AI and machine learning solutions in 5G and future networks
P. 123

ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 4



          3.2.7   Summary of machine learning methods          putational time and to decrease the number of observa‑

                                                               tions needed for a statistically accurate model. In terms
          In this subsection, we give a brief summary of all the al‑   of network management, reducing the feature set means
          gorithms described above.
                                                               that we can reduce the overhead of network monitoring.
          The MLP performs well when evaluating a probabilistic
                                                               So trying to reduce the number of features becomes very
          performance metric. However, it is not particularly better
                                                               important.
          at handling high‑dimensional data sets than other meth‑
                                                               Generally, theRF andXGBoosthaveabuilt‑in functionthat
          ods due to the large number of parameters that need to   evaluates the importance of features. Some of the fea‑
          be tuned.
                                                               ture importance bar graph plots based on RF and XGBoost
          The  main  strengths  of  SVM  are  its  effects  on  high‑
                                                               modeling are shown in Fig. 9. The features are sorted
          dimensional data and on data sets in which the number
                                                               based on their importance.
          of features is greater than the number of observations. It
                                                               In both RF and XGBoost,     activities/prefixes,
          takes less memory consumption due to the use of support
                                                               sent/current-prefixes,    prefixes/total-entries
          vector and the use of various kernel functions, which are
                                                               and as-path/total-entries show signi icant impor‑
          used in the decision function. However, SVM would over‑
                                                               tance compared to the other features. And these feature
           it the model if the differences between the number of fea‑   importance ranking results seem reasonable: 1) when
          tures and observations is too big.
                                                               the network goes down or the device goes down, the
          One of the Decision Tree’s major strengths is that it is easy
                                                               outgoing bytes de initely change; 2) the decline in the
          to understand and to analyze. The disadvantage of the De‑
                                                               number of the activated links decreases the number of
          cision Tree is that it is prone to over itting and has low   pre ixes; 3) decrease of the pre ixes changes the total
          generalization performance.
                                                               number of the entries; 4) failures of the nodes absolutely
          Random Forest, XGBoost, and LightGBM all belong to en‑
                                                               affect the network‑outgoing‑packets.
          semble  methods.  Ensemble  methods  combine  the  pre‑
                                                               However, there are large differences between RF and XG‑
          dicted results of multiple base estimators.  So the results
                                                               Boost feature importance ranking results. For example,
          are improved as compared to some individual estimators.
                                                               address-family/total-memory has the highest rank in
          There are two main kinds of ensemble methods. The  irst
                                                               the result of feature ranking of the RF method, while it
          one, such as Random Forest, includes techniques that con‑
                                                               is not a key feature in the XGBoost method. Also, feature
          sider results from many individual estimators and com‑
                                                               network-outgoing-bytes stands as an important role in
          bine their results using the average. The second one, such
                                                               XGBoost, while it is ranked much lower than many other
          as LightGBM and XGBoost, includes techniques that com‑
                                                               features by the RF method. Some studies have reported
          bine many weak estimators to get a decisive result of an   that the feature importance ranking built‑in function of
          ensemble.
                                                               RF is biased and unreliable [24]. Considering that, we
                                                               choose to use the features that are ranked by XGBoost in‑
          4.   EXPERIMENTATION AND EVALUATION                  stead of ranking results by RF method.
                                                               Take the result of XGBoost ranking for consideration. If
          In this section, we use the differential data as input fea‑
                                                               we take the peer router down as an instance, when the
          tures  and  train  the  ML  model  with  multiple  ML  algo‑
                                                               peer device is down, according to the default BGP con ig‑
          rithms.  After training, we use evaluation data and differ‑
                                                               uration on the experiment routers, after 180 seconds, the
          ent evaluation metrics to evaluate the model prediction
          capacity.                                            link is down. Consequently, the number of next‑hops is
                                                               de initely changed, and also the total number of octets re‑
          Note that there is no way to know in advance the best val‑
                                                               ceived in input packets from the speci ied address family
          ues  for  hyper‑parameters,  so  ideally  we  need  to  try  all
                                                               includes those received in error. These will not only affect
          possible  values  to  know  the  optimal  values.  Doing  this
                                                               the incoming and outgoing bytes but also cause drops of
          manually could take a considerable amount of time and
                                                               the packets and reduction of current pre ixes. Some dif‑
          resources and thus we use GridSearchCV to automate the
                                                               ferences in features importance ranking could be a result
          tuning of hyper‑parameters which improves the training   of features dependency. However, in general, the feature
          ef iciency.
                                                               rankings obtained in this paper are reasonable and bene‑
                                                                icial for future studies.
          4.1  Feature reduction                               Dropping features with a low score (no contribution or
                                                               low contribution to the model) will not in luence the ac‑
          Using  ensembles  of  decision  tree  methods  like  XGBoost
                                                               curacy (Fig. 10 and Fig. 11) but bene it the model by re‑
          may  not  perform  well  if  the  input  features  have  much
                                                               ducing training time (Fig. 12). Fig. 10 shows the accuracy
          noise, as this can result in ov  itting.  In the feature vec‑
                                                               and precision using different numbers of features and
          tors  we  use,  due  to  the  large  number  of  features,  the
                                                               Fig.11showstheaccuracyandrecall. Asthenumberofin‑
          model training is not ef icient and the training time be‑
          comes  very  long.  Some  features  not  only  do  not  con‑   put features changes, precision of the failures Node Down,
          tribute to the modeling, but increase the complexity of the   Interface Down, BGP Injection, and BGP Hijack have
          model.  From the point of view of machine learning, the   maintained a relatively high accuracy rate, while Packet
          reason for feature reduction is to shorten the model com‑  Loss & Packet Delay are relatively low. Recall of the
                                             © International Telecommunication Union, 2021                   107
   118   119   120   121   122   123   124   125   126   127   128