Page 73 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 4 – AI and machine learning solutions in 5G and future networks
P. 73

ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 4







                                    NETXPLAIN: REAL‑TIME EXPLAINABILITY OF
                              GRAPH NEURAL NETWORKS APPLIED TO NETWORKING

                               1
                                                                                          1
                                                            2
                                                1
                                                                   2
               David Pujol‑Perich , José Suárez‑Varela , Shihan Xiao , Bo Wu , Albert Cabellos‑Aparicio , Pere Barlet‑Ros 1
             1                                                                2
              Barcelona Neural Networking center, Universitat Politècnica de Catalunya., Network Technology Lab., Huawei
                                                    Technologies Co.,Ltd.
                               NOTE: Corresponding author: David Pujol‑Perich, david.pujol.perich@upc.edu
          Abstract – Recent advancements in Deep Learning (DL) have revolutionized the way we can ef iciently tackle complex opti‑
          mization problems. However, existing DL‑based solutions are often considered as black boxes with high inner complexity. As a
          result, there is still certain skepticism among the networking industry about their practical viability to operate data networks.
          In this context, explainability techniques have recently emerged to unveil why DL models make each decision. This paper fo‑
          cuses on the explainability of Graph Neural Networks (GNNs) applied to networking. GNNs are a novel DL family with unique
          properties to generalize over graphs. As a result, they have shown unprecedented performance to solve complex network
          optimization problems. This paper presents NetXplain, a novel real‑time explainability solution that uses a GNN to interpret
          the output produced by another GNN. In the evaluation, we apply the proposed explainability method to RouteNet, a GNN
          model that predicts end‑to‑end QoS metrics in networks. We show that NetXplain operates more than 3 orders of magnitude
          faster than state‑of‑the‑art explainability solutions when applied to networks up to 24 nodes, which makes it compatible with
          real‑time applications; while demonstrating strong capabilities to generalize to network scenarios not seen during training.

          Keywords – AI/ML for networks, explainability, graph neural networks
          1.   INTRODUCTION                                    In this context, explainability solutions [4] have recently
                                                               emerged as practical tools to interpret systematically the
          In  recent  years,  Deep  Learning  (DL)  has  revolutionized
                                                               decisions  produced  by  DL  models.  Particularly,  these
          the way we are able to solve a vast number of problems
                                                               recently  proposed  solutions  analyze  trained  DL  models
          by  inding meaningful patterns on large amounts of data.
                                                               from a black‑box perspective (i.e., they only analyze their
          This acquired knowledge then enables us to make highly
                                                               inputs and outputs) and aim to discover which elements
          accurate  predictions,  leading  to  systematically  outper‑
                                                               mainly drive the output produced by these models.  As a
          forming state‑of‑the‑art solutions in many different prob‑
                                                               result, they can eventually determine what are the most
          lems [1, 2]. However, in the  ield of networking, DL‑based
                                                               critical input elements to reach the  inal decisions. These
          techniques still pose an important technological barrier
                                                               kinds of techniques have been intensely examined in the
          to achieve market adoption. In general, Machine Learning
                                                                ield of computer vision, showing promising results [5].
          (ML)  solutions  provide  probabilistic  performance  guar‑
                                                               At the same time, the last few years have seen the explo‑
          antees, which typically degrade as the data deviates from
                                                               sion of Graph Neural Networks (GNNs) [6], a new neural
          the distribution observed during training. Moreover, neu‑
                                                               network family that has attracted large interest given its
          ral networks have very complex internal architectures, of‑
                                                               numerous  applications  to  different    ields  where  the  in‑
          ten with thousands or even millions of parameters not in‑
                                                               formation  is  fundamentally  represented  as  graphs  (e.g.,
          terpretable  by  humans.  As  a  result,  they  are  treated  as
                                                               chemistry  [7],  physics  [8],  biology  [9],  information  sci‑
          black boxes [3]. This limits the viability of these solutions
                                                               ence  [10,  11]).  This  newly  introduced  mechanism  has
          to be applied to networks, as these are critical infrastruc‑
                                                               proven,  to  date,  to  be  the  only  DL  technique  capable  of
          tures  where  it  is  essential  to  deploy  fully  reliable  solu‑
                                                               generalizing  with  high  accuracy  to  graphs  of  different
          tions. Otherwise, a potential miscon iguration could lead
                                                               sizes and structures not seen during the training phase.
          to  temporal  service  disruptions  with  serious  economic
                                                               In this context, GNNs have shown good properties to be
          damages for network operators.
                                                               applied in the  ield of computer networks,  as many key
          In this vein, we do need mechanisms that can delimit the
                                                               components in network control and management prob‑
          safe operational ranges of DL models.  This makes it fun‑
          damental to understand why and in what situations a DL‑   lems are fundamentally represented as graphs (e.g., topol‑
                                                               ogy,  routing).  Indeed,  we  have  already  witnessed  some
          based solution can fail. This can be achieved by producing
                                                               successful GNN‑based applications to network modeling
          human‑readable interpretations of the decisions made by
                                                               and optimization [12, 13, 14, 15].  However, the fact that
          these  models  (e.g.,  interpret  a  routing  decision  given  a
                                                               we are not able to understand the inner architecture of
          traf ic matrix and a network topology).  This would not
                                                               GNNs  presents  nowadays  a  major  barrier  that  may
          only  enable  us  to  achieve  more  mature  and  reliable  DL
          solutions but also to enhance their performance by mak‑  hinder its adoption in real-world networks.
          ing ad-hoc adjustments for a particular network scenario
          (e.g., hyper-parameter tuning).
                                             © International Telecommunication Union, 2021                    57
   68   69   70   71   72   73   74   75   76   77   78