Page 85 - ITU Journal Future and evolving technologies – Volume 2 (2021), Issue 2
P. 85

ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 2




                     ́
          Josilo and Dan [67] provide a resource allocation model   reliability  and  delay  by  CPU  allocation  and  data
          where edge services providers and devices interact as a   blocklength  in  ultra‑reliable  low  latency  commu-



          Stackelberg game.  The devices are the leaders and want   nication networks. The Q‑learning method  is  suitable

          to  minimize their tasks  completion  time  by  choosing to   when  there  is  not  much  communication or interaction













          which  edge server they  of load their tasks  and  through   with other agents in the system,    i.e.,  in  MEC


          which  access point.  Sardellitti  et  al.  [74]  use  matching   environment the mobile devices. However, if we assume








          theory to  assign  users to  a  MEC  server and  their   that  the mobile devices interact and are intelligent

          communication  and  computational  resources, according   agents,  Q‑learning  lacks  an  adaption  mechanism  to the

          to the users’ preferences.                           other  agents’  (mobile  devices)  actions.  Feng  et  al.  [51]
                                                               employ a WoLF‑PHC reinforcement learning for resource
          4.4.4   Learning methods                             allocation  to reduce energy consumption and prioritize





                                                               tasks in mission‑critical applications.  The WoLF‑PHC al‑




          Learning methods learn  from the  past  and/or  from the   gorithm adapts the learning rate by learning slower when
          environment. They are more rapid than classic methods,   we “take the ascendant” to let the other agent the time to

          but can be less precise. Each one possesses its own ad‑   adapt its strategy and reach a whole system equilibrium.

          vantages or inconvenient.                            Conversely, the learning rate will be faster when the other
          Evolutionary  Computation  (EA)  EA  is  inspired  by   agent  takes  the  ascendant  to  “catch  them  up”  [101].
          biology.   Many  algorithms  exist  under EA  and  are   Deep neural network Li and and Lv [48] use a Deep Neu‑


          more or less  adapted  to  certain  problems  with  their   ral Network (DNN) for resource allocation  to minimize







          own  pros and  cons.  For example,  genetic  algorithms   the network energy consumption. They train DNNs to









          tend  to  not  be  trapped  in  local  optima  [99,  100]  while   simulate the behavior of a sequential quadratic program‑
          being hard tuning it to problems.  Thus, Wan et al. [100]   ming algorithm. They train a DNN with a  ixed number



          propose a  different use  of EA  for task‑driven  resource   of devices in the data  set and the other with a random















          assignment,  including  hybridization of different EA   number of devices, rendering the latter  one more  le-







          algorithms.  Li et  al.  [99]  use  a  genetic  algorithm  to   xible  than  the specialized one. Thus, the DNN will








          minimize completion  time  for mo‑  bile  devices and  an   take  less  time  to  solve  the  problem  with  an


          edge server.                                         approximation  of  the optimal  result.    However, the




          Reinforcement learning  Allocation  resource schemes   environment is highly  dynamic  and  leads  various



          can  use  a  reinforcement learning  method. More  speci-  uncertainties.    A  training  set  might  be  under‑



          fically  a  Q‑learning  method can  be  used. It  has  for   representative  of  the  complex  system  and  the  trained
          advantage  to  be  model‑free and  adapt  itself  to  a   DNN is not  lexible enough to tackle some situations as it

          stochastic  environment. It  is  so a  solution  for dynamic   does not adapt on run [69]. Moreover, it can be dif icult





          context,  that  we  retrieve in  mission‑critical MEC   to  ind good data beforehand.

          scenarios [70]. Also,  we can tune it to take more or less



          long‑term  decisions. Wang  et  al.  [73]  propose a
                                                               5.    MOBILE RESOURCE DEPLOYMENT

          multi‑stack   reinforcement learning  algorithm  for

          resource allocation in mobile edge computing.  They use   When MEC servers are mounted on UAV or robots, they

          multi‑stack  to  take  advantage  of a  historical resource   are suited to cover the needs of mobile users in temporary



          allocation scheme and avoid learning  the  same  scheme.   events or emergency responses. Indeed,  ixed resources
          However,  a  disadvantage  of reinforcement learning  is   might instead be too costly, too in lexible to deploy or just




          the  Q  table.  It  will  be  excessively large  for large‑scale   needed for a limited time. Particularly in emergency re‑
          systems    due   to   many   different    possible  states,   sponses and post‑disaster management,  deploying tem‑

          rendering its  storage  and  the  Q  value  search complex   porary additional computing resources can help rescuers,


          [53,  69].  Alternatively,  we  can  use  a  deep rein‑   victims and wireless devices processing critical tasks with
          forcement learning method, with a deep neural network   critical delay constraints. However, mobile edge resource
          to estimate the Q value for an action and a state.  But we   deployment comes with many challenges.


          lose  the  “model‑free”  properties of the  Q‑learning,  and   For deploying one mobile resource, we have to optimize



          need to  train  a  model.  Chen et  al.  [70]  propose a  deep   its trajectory between a starting and an ending point to


          reinforcement learning  for CoMEC network,  where col‑   serve the mobile devices by minimizing the delay [104]



          laborative edge servers are connected.  Li et al. [53] use   or the system energy consumption [83, 78, 86]. For de‑

          deep reinforcement learning for allocating computational   ploying multiple mobile resources, we need to optimize

          resources of a MEC server to mobile devices by minimi-  their numbers, i.e minimizing their number while satis‑


          zing  execution  delay  and  energy consumption.  Wang   fying the goal,  their locations and associate them with







          et  al.  [69]  introduce a  deep reinforcement learning   mobile users. Indeed, with multiple  mobile resources,







          based  resource allocation  algorithm  to  minimize the   we do not have a starting and ending point so we cannot


          computing  and  routing  delay  in  edge networks.  They   plan the entire trajectory but rather compute the next lo‑

          also  consider balancing  the  resource allocation  to   cation point. Goals can be minimizing energy consump‑




          reduce local‑ ized pressure on the network and improve   tion [92], minimizing number of deployed nodes [82] or

          delays.  Yang  et  al.  [56]  propose a  deep  reinforcement   balance the workload between resources [108]. Also, the

          learning  agent  for the  trade‑off between  downlink  data   deployment scheme is often joint with another problem‑

                                             © International Telecommunication Union, 2021                    71
   80   81   82   83   84   85   86   87   88   89   90