Page 85 - ITU Journal Future and evolving technologies

Page 85 - ITU Journal Future and evolving technologies – Volume 2 (2021), Issue 2

P. 85

ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 2

́
Josilo and Dan [67] provide a resource allocation model reliability and delay by CPU allocation and data
where edge services providers and devices interact as a blocklength in ultra‑reliable low latency commu-

Stackelberg game. The devices are the leaders and want nication networks. The Q‑learning method is suitable

to minimize their tasks completion time by choosing to when there is not much communication or interaction

which edge server they of load their tasks and through with other agents in the system, i.e., in MEC

which access point. Sardellitti et al. [74] use matching environment the mobile devices. However, if we assume

theory to assign users to a MEC server and their that the mobile devices interact and are intelligent

communication and computational resources, according agents, Q‑learning lacks an adaption mechanism to the

to the users’ preferences. other agents’ (mobile devices) actions. Feng et al. [51]
employ a WoLF‑PHC reinforcement learning for resource
4.4.4 Learning methods allocation to reduce energy consumption and prioritize

tasks in mission‑critical applications. The WoLF‑PHC al‑

Learning methods learn from the past and/or from the gorithm adapts the learning rate by learning slower when
environment. They are more rapid than classic methods, we “take the ascendant” to let the other agent the time to

but can be less precise. Each one possesses its own ad‑ adapt its strategy and reach a whole system equilibrium.

vantages or inconvenient. Conversely, the learning rate will be faster when the other
Evolutionary Computation (EA) EA is inspired by agent takes the ascendant to “catch them up” [101].
biology. Many algorithms exist under EA and are Deep neural network Li and and Lv [48] use a Deep Neu‑

more or less adapted to certain problems with their ral Network (DNN) for resource allocation to minimize

own pros and cons. For example, genetic algorithms the network energy consumption. They train DNNs to

tend to not be trapped in local optima [99, 100] while simulate the behavior of a sequential quadratic program‑
being hard tuning it to problems. Thus, Wan et al. [100] ming algorithm. They train a DNN with a ixed number

propose a different use of EA for task‑driven resource of devices in the data set and the other with a random

assignment, including hybridization of different EA number of devices, rendering the latter one more le-

algorithms. Li et al. [99] use a genetic algorithm to xible than the specialized one. Thus, the DNN will

minimize completion time for mo‑ bile devices and an take less time to solve the problem with an

edge server. approximation of the optimal result. However, the

Reinforcement learning Allocation resource schemes environment is highly dynamic and leads various

can use a reinforcement learning method. More speci- uncertainties. A training set might be under‑

fically a Q‑learning method can be used. It has for representative of the complex system and the trained
advantage to be model‑free and adapt itself to a DNN is not lexible enough to tackle some situations as it

stochastic environment. It is so a solution for dynamic does not adapt on run [69]. Moreover, it can be dif icult

context, that we retrieve in mission‑critical MEC to ind good data beforehand.

scenarios [70]. Also, we can tune it to take more or less

long‑term decisions. Wang et al. [73] propose a
5. MOBILE RESOURCE DEPLOYMENT

multi‑stack reinforcement learning algorithm for

resource allocation in mobile edge computing. They use When MEC servers are mounted on UAV or robots, they

multi‑stack to take advantage of a historical resource are suited to cover the needs of mobile users in temporary

allocation scheme and avoid learning the same scheme. events or emergency responses. Indeed, ixed resources
However, a disadvantage of reinforcement learning is might instead be too costly, too in lexible to deploy or just

the Q table. It will be excessively large for large‑scale needed for a limited time. Particularly in emergency re‑
systems due to many different possible states, sponses and post‑disaster management, deploying tem‑

rendering its storage and the Q value search complex porary additional computing resources can help rescuers,

[53, 69]. Alternatively, we can use a deep rein‑ victims and wireless devices processing critical tasks with
forcement learning method, with a deep neural network critical delay constraints. However, mobile edge resource
to estimate the Q value for an action and a state. But we deployment comes with many challenges.

lose the “model‑free” properties of the Q‑learning, and For deploying one mobile resource, we have to optimize

need to train a model. Chen et al. [70] propose a deep its trajectory between a starting and an ending point to

reinforcement learning for CoMEC network, where col‑ serve the mobile devices by minimizing the delay [104]

laborative edge servers are connected. Li et al. [53] use or the system energy consumption [83, 78, 86]. For de‑

deep reinforcement learning for allocating computational ploying multiple mobile resources, we need to optimize

resources of a MEC server to mobile devices by minimi- their numbers, i.e minimizing their number while satis‑

zing execution delay and energy consumption. Wang fying the goal, their locations and associate them with

et al. [69] introduce a deep reinforcement learning mobile users. Indeed, with multiple mobile resources,

based resource allocation algorithm to minimize the we do not have a starting and ending point so we cannot

computing and routing delay in edge networks. They plan the entire trajectory but rather compute the next lo‑

also consider balancing the resource allocation to cation point. Goals can be minimizing energy consump‑

reduce local‑ ized pressure on the network and improve tion [92], minimizing number of deployed nodes [82] or

delays. Yang et al. [56] propose a deep reinforcement balance the workload between resources [108]. Also, the

learning agent for the trade‑off between downlink data deployment scheme is often joint with another problem‑

80 81 82 83 84 85 86 87 88 89 90