Page 52 - Proceedings of the 2018 ITU Kaleidoscope
P. 52

2018 ITU Kaleidoscope Academic Conference




           management areas, can improve the resilience of the system.   in telco cloud deployments,  more fault  management
           A demonstrator evaluating the presented concepts is   functions can be placed to more problematic areas.
           introduced in section 7 and in section 8  we present our
           conclusion and discuss future work.                Diverse at the edge, but simple at the core: Resilient
                                                              systems  may be very complex, but share simple common
            2   ROBUSTNESS AND RESILIENCY IN MOBILE           properties. The Internet, for example, consists of a very large
                               NETWORKS                       number of very diverse services, some of them extremely
                                                              complex, but they all communicate and interface  with a
           Resiliency is the capability of a system to recover to a stable,   simple set of shared protocols. In future mobile networks, a
           functioning state after failure or adverse events [3]. It is not   common data plane can provide such simplicity at the core.
           the same as robustness. A robust system is strongly designed   The data providers and consumers  may operate in  vastly
           to withstand any foreseen problems or failures, but may be   different scopes and time durations, but be able to
           too rigid and fail to survive and adapt in case of unforeseen   communicate  with each other using the Service-Oriented
           circumstances,  which are inevitably bound to happen in   Architecture (SOA) principles in a cloud-native architecture
           complex systems. For example, a farmer  may prepare his   utilizing a common data sharing bus.
           crop against fire and flooding and local pests, but the crop
           can be destroyed by a foreign plant virus introduced in the   3   SELF-HEALING IN MOBILE NETWORKS
           environment. Paradoxically,  a very robust  system can be
           more susceptible to failure due to its increased rigidness and   The simplest self-healing solutions are rule-based systems,
           complexity [3]. Modern telephone networks are often said to   where specified automated corrective  workflows are
           be (together with electric power grids) among the largest and   triggered, when given trigger conditions are fulfilled. Such
           most complex human-created systems and their distributed   systems, however, can reliably  work only on anticipated
           nature  makes them even  more complex to  manage and   problems and typically  fail to perform  well in completely
           predict.  Therefore, simple robust design principles   unforeseen circumstances. Furthermore, the creation and
           (redundancy etc.) are not sufficient to ensure the ultra-  maintenance of the rule base is expensive and laborious. It
           reliable highly-available network performance required for   may even make the system more rigid and thus less resilient.
           many critical future use cases, for example remote surgery.
                                                              The rules,  which corrective actions to trigger, could be
           Resilient system, on the other hand, follow principles that   learned using machine learning, as a classification problem.
           allow them to recover even in case of completely unforeseen   Each state is classified either as normal or to a degraded state
           disastrous events. For example, by diversifying the crop, a   connected to one of the corrective  workflows. However,
           farmer can ensure that a new plant virus will not be able to   since  the anomalous states  are, by definition, rare, the
           wipe out all  the production. Typically, resilient  systems   detection model is learned on a skewed training dataset. It
           follow a number of design principles [3]:          may also fail to recognize  new, unforeseen problematic
                                                              states. Another problem is the availability of such labelled
           Monitoring and adaptation:  Resilient systems must  be   training datasets.
           responsive to change, and for that they need to monitor the
           system and detect changes early. An automatic anomaly   Therefore, self-healing functions are often implemented as a
           detection system can profile and learn normal behavior at   four-stage process: profiling the normal states of the system,
           runtime and detect deviations  from it,  giving an early   detecting deviations from the normal (anomalies), diagnosis
           warning even in case of  unforeseen circumstances. If   and acting. The advantage of learning the normal behavior is
           connected to a diagnosis  function, it can also trigger   that any deviations from it, even unforeseen ones, can be
           automatic corrective or mitigating actions. SON self-healing   detected. On the other hand, not all deviations are
           function based on anomaly detection and diagnosis are   degradations and  so a diagnosis function  is required to
           discussed in the following chapters.               diagnose the detected anomalies and connect them to
                                                              possible corrective actions. Additionally, to adapt to trend
           Redundancy, decoupling and modularity: In addition to   and seasonal changes in the normal network behavior, for
           duplicating capacity for redundancy, resilient systems often   example to the evolution in the  network traffic
           have a decoupled and decentralized structure. In 5G RAN,   characteristics, the profiles for the normal states need to be
           one such approach is the RAN multi-connectivity. It is often   continuously updated.
           utilized to increase the throughput, but can be also used to
           exploit the inherent  macro-diversity effect of  multiple   In Radio Access Networks (RANs), resources are typically
           simultaneous connections, such that the probability that at   more scarce and it is often not possible to achieve desired
           least one connection is sufficiently strong is increased [6].   level of resilience simply by means of overprovisioning of
                                                              resources. The available spectrum, for example, is limited
           Focusing: When changes are detected, resilient systems may   and cannot be extended. Therefore, in addition to methods
           focus on the problematic area to respond to a problem or a   like  multi-connectivity [5], self-healing solutions can be
           change. In network  management, excess resources can be   especially important in RAN to enable the required level of
           deployed where unexpected events are detected. Especially   reliability.






                                                           – 36 –
   47   48   49   50   51   52   53   54   55   56   57