Page 53 - Proceedings of the 2018 ITU Kaleidoscope

P. 53

Machine learning for a 5G future

OSS contexts labeling
counters
augmented
learning

KPIs trend pattern root cause
profiles changes clusters analysis

anomaly anomaly anomaly
event
values patterns
detection

Figure 1 – Anomaly detection and diagnosis

Figure 1 shows an overview of our anomaly detection and • The implementation architecture, e.g. distributed or
diagnosis function for Radio Access Networks (RANs). centralized
Profiling, detection and diagnosis are done per selected • Is any labelled data available or is the learning fully
contexts, for example per cell and distinguishing between unsupervised, i.e. based on the assumption that
workdays and weekends. The intended deployment resides common or average network states are normal
on NM-level and analyzes Performance Management (PM) • The scope of the profiling: e.g. individual network
data collected from a Network Management System (NMS). element, subsets of similar network elements or one
The collected Key Performance Indicators (KPIs) are baseline for all network elements
typically aggregated with minutely or hourly granularity. • The profiled features and their distributions
Note that the concept allows also other deployment options. • Is the whole feature set considered as one high
dimensional distribution or only subsets of them?
Once the profiles are created, an anomaly level is calculated • Should the profiles be understandable and
for each KPI in each cell against the profiles for the collected intuitively interpretable
time series samples. Based on the anomaly levels, distinct
anomaly events are detected. An anomaly event only For the RAN anomaly detection algorithm that is based on
indicates that something unusual has occurred, but not [7], we decided that profiling would be on individual cell-
necessarily a network performance degradation or other level and based on sub-sets of features. These choices made
event that would require corrective actions. Therefore, the the algorithm applicable for distributed implementation and
detected anomaly events are analyzed by a diagnosis computationally less demanding. The consideration of sub-
function, which connects the detected anomalies to the most sets of features for a profile also made the results and the
like root cause(s). Once the causes of the anomaly are known, profiles more interpretable. Additionally, individual profiles
they may be connected to corrective workflows. In the next were created for work days and weekends. This means that
two sections we look closer at the two main steps of this we create two profiles for each cell.
process, the anomaly detection and the diagnosis, for RAN
self-healing.
The algorithm uses two kinds of profiles: the diurnal profile
considers the daily seasonality of the KPIs, while the cross-
4 ANOMALY DETECTION IN RADIO ACCESS correlational profile captures the correlation relationship of
NETWORKS KPIs. The underlying algorithm is the same for the two types
of profiles, the difference is the input data for the algorithms.
The basis on which an anomaly detection algorithm marks The diurnal profiling considers one KPI for one profile and
an event anomalous is subject to learning. The model that for each hour of the day captures the distribution with
captures the learned normal behavior is called a profile. statistical models, whereas the cross-correlational profile
Depending on the anomaly detection algorithm profiles may considers two or more KPIs together and captures the joint
be of various kinds: in [7] the profiles are statistical models distribution of the KPIs.
of normal distributions with fitted set of parameters, whereas
in [8] the profile consists of cluster centroids in an encoded The Figure 2 illustrates the two types of profiles for a cell:
feature space. The choice of profiling algorithm is dependent the diurnal pattern caused by the human life-cycle is
on application specific design choices that need to be
considered, some of these considerations are:

– 37 –

48 49 50 51 52 53 54 55 56 57 58