Page 93 - Proceedings of the 2018 ITU Kaleidoscope
P. 93

Machine learning for a 5G future





                                                        Machine learning





                                Supervised learning    Unsupervised learning    Reinforcement learning


                                 Example algorithms:     Example algorithms:     Example algorithms:
                             - Support vector machine  - Principal and independent  - Q-learning,
                             - Bayesian learning     component analysis       - Markov decision process
                             - Regression model      (PCA/ICA)                - Multi-armed bandits
                             - K-nearest neighbor    - K-means clustering
                             - Gradient boosting decision  - Spectral clustering
                             tree                    - Replicator neural network


                                         Figure 3.  Classification of machine learning techniques.

           large and too complex [4,9].  Thus,  the  autonomic   Reinforcement learning performs iterative learning through
           adaptation of network functions through the  interaction   a series of reinforcements by rewards  or  punishments.  It
           with the internal and external environments can be carried   learns to achieve its goal from its own experience. Unlike
           out by machine learning techniques.                supervised  learning, reinforcement learning does not
                                                              require provisioning of correct input/output data pairs and
           In this section, we first provide an overview  the  machine   explicit correction of sub-optimal actions. As shown in Fig.
           learning techniques and then list  the  relevant  techniques   4, the reinforcement learning agent receives  percepts
           that can be used for the automation of 5G network slicing   containing the state of environment (or system) through its
           functions. As shown in Fig. 3, machine learning techniques   sensors and performs actions through its actuators in such a
           can be broadly classified into three categories: supervised   way that it maximizes the  cumulative rewards. The agent
           learning, unsupervised learning, and reinforcement learning   interacts with its environment in  discrete  time  steps.  At
           [10].                                              each time t, the agent receives a percept p t, which includes
                                                              the reward r t. It then chooses an action a t from the set of
                                                              available  actions and sends to the environment. The
           The supervised learning techniques learn (or deduce) a   environment moves to a new state  s t+1 and  reward  r t+1
           function from training data, which comprise of  pairs  of   associated with the transition (s t+1|s t,a t) is determined. The
           input and desired outputs. The output of the function can be   goal of a reinforcement learning agent is to collect as much
           continuous values (called regression) or a class label of the   reward as possible. There are two factors that characterize
           input values (called classification). After training,  the   reinforcement learning techniques: transition model from s
           learning agent or element predicts the value of function for   to  s’ with  action  a, i.e., probability P(s’|s,a) =  Pr
           any valid input from unseen situations in a reasonably valid   (s t+1=s’|s t=s, a t=a), and policy for  applying an action in a
           way. Thus, supervised learning principle can be expressed   given state. The transition model is known (e.g., in Markov
           as follows: Given a training set of  N  examples  {x i, y i =   decision process) or unknown (in Q-learning). The goal of
           f(x i)}, where each y i was generated by unknown function f,   Q-learning is to learn a policy and maximize its total (future)
           discover a function h (hypothesis) such that it approximates   reward. It does this by adding the maximum reward
           the true function  f (i.e.,  h  ≈  f). Support vector machine   attainable from the future states to the reward in its current
           (SVM), Bayesian learning, and regression models  are  the   state, thus effectively influencing the current action by the
           popular supervised learning techniques applicable  for   potential  reward  in the future. This reward is a weighted
           solving network problems.
                                                                    Sensors          Agent         Actuators
           Unsupervised learning is based on unstructured  or
           unlabeled data. The learning agent learns patterns in input
           data even though no explicit feedback  is  provided.                       Reward
           Unsupervised  learning  techniques are used for data    Percepts                           Actions
                                                                   (state)
           clustering, dimensionality reduction, density  estimation,             Environment
           etc.. K-means clustering, principal  component  analysis             (Controlled system)
           (PCA),  and independent component analysis (ICA) are                s1   s3
           often  used  algorithms for clustering and dimensionality                      s4
           reduction of system data collected for network control and           s2  States
           management.
                                                                     Figure 4.  Reinforcement learning components.






                                                           – 77 –
   88   89   90   91   92   93   94   95   96   97   98