Page 93 - Proceedings of the 2018 ITU Kaleidoscope
P. 93
Machine learning for a 5G future
Machine learning
Supervised learning Unsupervised learning Reinforcement learning
Example algorithms: Example algorithms: Example algorithms:
- Support vector machine - Principal and independent - Q-learning,
- Bayesian learning component analysis - Markov decision process
- Regression model (PCA/ICA) - Multi-armed bandits
- K-nearest neighbor - K-means clustering
- Gradient boosting decision - Spectral clustering
tree - Replicator neural network
Figure 3. Classification of machine learning techniques.
large and too complex [4,9]. Thus, the autonomic Reinforcement learning performs iterative learning through
adaptation of network functions through the interaction a series of reinforcements by rewards or punishments. It
with the internal and external environments can be carried learns to achieve its goal from its own experience. Unlike
out by machine learning techniques. supervised learning, reinforcement learning does not
require provisioning of correct input/output data pairs and
In this section, we first provide an overview the machine explicit correction of sub-optimal actions. As shown in Fig.
learning techniques and then list the relevant techniques 4, the reinforcement learning agent receives percepts
that can be used for the automation of 5G network slicing containing the state of environment (or system) through its
functions. As shown in Fig. 3, machine learning techniques sensors and performs actions through its actuators in such a
can be broadly classified into three categories: supervised way that it maximizes the cumulative rewards. The agent
learning, unsupervised learning, and reinforcement learning interacts with its environment in discrete time steps. At
[10]. each time t, the agent receives a percept p t, which includes
the reward r t. It then chooses an action a t from the set of
available actions and sends to the environment. The
The supervised learning techniques learn (or deduce) a environment moves to a new state s t+1 and reward r t+1
function from training data, which comprise of pairs of associated with the transition (s t+1|s t,a t) is determined. The
input and desired outputs. The output of the function can be goal of a reinforcement learning agent is to collect as much
continuous values (called regression) or a class label of the reward as possible. There are two factors that characterize
input values (called classification). After training, the reinforcement learning techniques: transition model from s
learning agent or element predicts the value of function for to s’ with action a, i.e., probability P(s’|s,a) = Pr
any valid input from unseen situations in a reasonably valid (s t+1=s’|s t=s, a t=a), and policy for applying an action in a
way. Thus, supervised learning principle can be expressed given state. The transition model is known (e.g., in Markov
as follows: Given a training set of N examples {x i, y i = decision process) or unknown (in Q-learning). The goal of
f(x i)}, where each y i was generated by unknown function f, Q-learning is to learn a policy and maximize its total (future)
discover a function h (hypothesis) such that it approximates reward. It does this by adding the maximum reward
the true function f (i.e., h ≈ f). Support vector machine attainable from the future states to the reward in its current
(SVM), Bayesian learning, and regression models are the state, thus effectively influencing the current action by the
popular supervised learning techniques applicable for potential reward in the future. This reward is a weighted
solving network problems.
Sensors Agent Actuators
Unsupervised learning is based on unstructured or
unlabeled data. The learning agent learns patterns in input
data even though no explicit feedback is provided. Reward
Unsupervised learning techniques are used for data Percepts Actions
(state)
clustering, dimensionality reduction, density estimation, Environment
etc.. K-means clustering, principal component analysis (Controlled system)
(PCA), and independent component analysis (ICA) are s1 s3
often used algorithms for clustering and dimensionality s4
reduction of system data collected for network control and s2 States
management.
Figure 4. Reinforcement learning components.
– 77 –