Page 131 - ITU Journal, Future and evolving technologies - Volume 1 (2020), Issue 1, Inaugural issue

P. 131

ITU Journal on Future and Evolving Technologies, Volume 1 (2020), Issue 1

framework). The computation of (k,n) and followed by the different slices in order to maximize
potential update of the rRMPolicyDedicatedRatio the cumulative reward are learnt offline by the ML
attribute is done every t minutes. The training host who provides them to the ML
determination of the resource usage quota (k,n) is inference host. Further details about this training
realized through the following functions: process and the reward formulation are given in
Section 4.2.
• Per-slice action selection policies
• Resource usage quota computation
The action selection policy of slice k gets the
network state s(k) observed for this slice at the time This function computes the value of the resource
when the policy is executed and determines the usage quota σ(k,n) to be allocated to each slice and
action a(k) to be applied for this slice. The action a(k) cell for the next time step by applying the
is composed of N per-cell actions that take one out increase/maintain/decrease actions provided by
of three possible values corresponding to: increase the action selection policies of all the slices and
the resource usage quota (k,n) for slice k in cell n configures the resulting σ(k,n) values in the O-DU
in an amount of Δ for the next time step, maintain through the rRMPolicyDedicatedRatio attribute. To
the same resource usage quota or decrease it in an make the configuration on a per-slice basis, an
amount of Δ. rRMPolicyMemberList is specified for each RAN slice,
being composed of a single member with the S-
In turn, the state s(k) includes N different per-cell NSSAI and PLMNid of the RAN slice. Then, the
components, each one given by the triple rRMPolicyDedicatedRatio is configured per
<ρ(k,n),σ(k,n), σava(n)> where ρ(k,n) is the fraction rRMPolicyMemberList in each cell.
of PRBs occupied by the slice k in cell n, σ(k,n) is the
current resource usage quota allocated to the slice When applying the actions, this function ensures
and σava(n) is the total amount of resource usage that the maximum cell bit rate value associated to
quota in the cell not allocated to any slice. While the the termDensity and dlThptPerUe parameters is not
values of σ(k,n) and σava(n) are directly available at exceeded. Moreover, since the action selection
the RAN cross-slice manager, the value of ρ(k,n) is policies for the different slices operate
obtained from the different cells through the independently, this function also checks that the
performance management (PM) services offered by aggregated resource usage quota for all the slices in
the O1 interface. In particular, using the a cell after applying the actions does not exceed 1 in
performance measurements defined in [49], it order not to exceed the cell capacity. If this happens,
corresponds to the ratio between the “DL PRB used it applies first the actions of the slices involving a
for data traffic”, which measures the number of reduction or maintenance of the resource usage
PRBs used in average for data traffic in a given slice quota and the remaining capacity is distributed
and cell, and the “DL total available PRB”, which among the slices that have increase actions. This
measures the number of available PRBs in the cell. distribution is proportional to their dlThptPerSlice
Both measurements are collected from the gNB-DU values, as long as their current throughput is not
every time step, so their average is performed along already higher than the dlThptPerSlice. For doing
the time step duration t. this adjustment, the measured throughput per slice
across all the cells in the last time step is needed. It
Following the DQN approach, the action selection can be obtained from the PM services of the O1
policy (k) of the k-th slice seeks to maximize a interface using the “Downstream throughput for
cumulative reward that captures the desired Single Network Slice Instance” Key Performance
optimization target to be achieved. In particular, the Indicator of [50].
action selection policy (k) for a given state s(k) is
Q (s(k),a(k),θ ) , where
defined as argmax a(k) k k 4.2 Trainer of RAN cross-slice management
Qk(s(k),a(k),k) is the output of a deep NN for the policies
input state s(k) and the output action a(k), This component constitutes the training part of the
providing the maximum expected cumulative DQN model intended to learn the NN parameters k
reward starting at s(k) and triggering a(k). The that determine the per-slice action selection
internal structure of the NN is specified by the policies to be used by the RAN cross-slice manager.
vector of parameters k that contains the weights of The training process makes use of a multi-agent
the different neuron connections. The optimum DQN approach in which each DQN agent learns the
values of k that determine the policies to be
optimum policy of a different RAN slice by

126 127 128 129 130 131 132 133 134 135 136