Page 123 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 4 – AI and machine learning solutions in 5G and future networks

P. 123

ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 4

3.2.7 Summary of machine learning methods putational time and to decrease the number of observa‑

tions needed for a statistically accurate model. In terms
In this subsection, we give a brief summary of all the al‑ of network management, reducing the feature set means
gorithms described above.
that we can reduce the overhead of network monitoring.
The MLP performs well when evaluating a probabilistic
So trying to reduce the number of features becomes very
performance metric. However, it is not particularly better
important.
at handling high‑dimensional data sets than other meth‑
Generally, theRF andXGBoosthaveabuilt‑in functionthat
ods due to the large number of parameters that need to evaluates the importance of features. Some of the fea‑
be tuned.
ture importance bar graph plots based on RF and XGBoost
The main strengths of SVM are its effects on high‑
modeling are shown in Fig. 9. The features are sorted
dimensional data and on data sets in which the number
based on their importance.
of features is greater than the number of observations. It
In both RF and XGBoost, activities/prefixes,
takes less memory consumption due to the use of support
sent/current-prefixes, prefixes/total-entries
vector and the use of various kernel functions, which are
and as-path/total-entries show signi icant impor‑
used in the decision function. However, SVM would over‑
tance compared to the other features. And these feature
it the model if the differences between the number of fea‑ importance ranking results seem reasonable: 1) when
tures and observations is too big.
the network goes down or the device goes down, the
One of the Decision Tree’s major strengths is that it is easy
outgoing bytes de initely change; 2) the decline in the
to understand and to analyze. The disadvantage of the De‑
number of the activated links decreases the number of
cision Tree is that it is prone to over itting and has low pre ixes; 3) decrease of the pre ixes changes the total
generalization performance.
number of the entries; 4) failures of the nodes absolutely
Random Forest, XGBoost, and LightGBM all belong to en‑
affect the network‑outgoing‑packets.
semble methods. Ensemble methods combine the pre‑
However, there are large differences between RF and XG‑
dicted results of multiple base estimators. So the results
Boost feature importance ranking results. For example,
are improved as compared to some individual estimators.
address-family/total-memory has the highest rank in
There are two main kinds of ensemble methods. The irst
the result of feature ranking of the RF method, while it
one, such as Random Forest, includes techniques that con‑
is not a key feature in the XGBoost method. Also, feature
sider results from many individual estimators and com‑
network-outgoing-bytes stands as an important role in
bine their results using the average. The second one, such
XGBoost, while it is ranked much lower than many other
as LightGBM and XGBoost, includes techniques that com‑
features by the RF method. Some studies have reported
bine many weak estimators to get a decisive result of an that the feature importance ranking built‑in function of
ensemble.
RF is biased and unreliable [24]. Considering that, we
choose to use the features that are ranked by XGBoost in‑
4. EXPERIMENTATION AND EVALUATION stead of ranking results by RF method.
Take the result of XGBoost ranking for consideration. If
In this section, we use the differential data as input fea‑
we take the peer router down as an instance, when the
tures and train the ML model with multiple ML algo‑
peer device is down, according to the default BGP con ig‑
rithms. After training, we use evaluation data and differ‑
uration on the experiment routers, after 180 seconds, the
ent evaluation metrics to evaluate the model prediction
capacity. link is down. Consequently, the number of next‑hops is
de initely changed, and also the total number of octets re‑
Note that there is no way to know in advance the best val‑
ceived in input packets from the speci ied address family
ues for hyper‑parameters, so ideally we need to try all
includes those received in error. These will not only affect
possible values to know the optimal values. Doing this
the incoming and outgoing bytes but also cause drops of
manually could take a considerable amount of time and
the packets and reduction of current pre ixes. Some dif‑
resources and thus we use GridSearchCV to automate the
ferences in features importance ranking could be a result
tuning of hyper‑parameters which improves the training of features dependency. However, in general, the feature
ef iciency.
rankings obtained in this paper are reasonable and bene‑
icial for future studies.
4.1 Feature reduction Dropping features with a low score (no contribution or
low contribution to the model) will not in luence the ac‑
Using ensembles of decision tree methods like XGBoost
curacy (Fig. 10 and Fig. 11) but bene it the model by re‑
may not perform well if the input features have much
ducing training time (Fig. 12). Fig. 10 shows the accuracy
noise, as this can result in ov itting. In the feature vec‑
and precision using different numbers of features and
tors we use, due to the large number of features, the
Fig.11showstheaccuracyandrecall. Asthenumberofin‑
model training is not ef icient and the training time be‑
comes very long. Some features not only do not con‑ put features changes, precision of the failures Node Down,
tribute to the modeling, but increase the complexity of the Interface Down, BGP Injection, and BGP Hijack have
model. From the point of view of machine learning, the maintained a relatively high accuracy rate, while Packet
reason for feature reduction is to shorten the model com‑ Loss & Packet Delay are relatively low. Recall of the
© International Telecommunication Union, 2021 107

118 119 120 121 122 123 124 125 126 127 128