Page 115 - ITU Journal, Future and evolving technologies - Volume 1 (2020), Issue 1, Inaugural issue
P. 115

ITU Journal on Future and Evolving Technologies, Volume 1 (2020), Issue 1




                                                    2
          vector   (  ) is never fully revealed to the learner .  crucial because not many testing kits are available dur-
           Test and Quarantine: At each time    ∈ [1,   ], the  ing epidemics. An alternative, somewhat equivalent and
          learner has a unit budget to choose an individual    ∈  simpler objective is to remove the capacity constraints
          [1,   ] in order to “sample” (test for infection). Sam-  altogether and include a cost for using testing kits,
          pling an individual    at    reveals the state    (  ). We
                                                    
                                                                                 
          let   (  ) ∈ [0,   ] denote the sampling decision at time   min    (∑ ‖  (  )‖ +   1 (  (  ) ≠ 0)) ,  (7)
            . In case no one is sampled at   , we let   (  ) = 0. We                   1
                                                                                =1
          let    (  ) denote the test result at time   ;    (  ) = +1
                                                   
                
          means the person tested positive,    (  ) = −1 means the  where    > 0. We now briefly discuss how to solve the
                                          
          test was negative, and    (  ) = 0 means that the individ-  above discussed MDP. Theoretical results on the ex-
                                
          ual was not tested at time   . The vector comprising of  istence of optimal policies, and methods to solve con-
          observations    (  ) is denoted    (  ).             strained MDPs, or POMDPs can be found in [4, 21, 22].
                        
          If sampled individuals are found to be infected, then  In case the parameters describing the environment are
          they are isolated, i.e., kept out of the population, and  unknown, we can use machine learning techniques de-
          hence cannot spread the disease to their neighbors. We  veloped in [20].
          let   (  ) denote the set of those individuals who are iso-  Belief State MDP: We introduce a belief state, which is
          lated at time   .                                    a posterior distribution over the state space   . This
           State Transition: Let us now look at the controlled  transforms the POMDP to a continuous-state MDP
          transition probabilities of the controlled Markov process  on the belief state.  We denote the belief state by
            (  ). We first introduce some notations. For   ,    ∈   ,  ℐ(  ) = {ℐ(  ,   )}   ∈   , where ℐ(  ,   ) ∶= ℙ (  (  ) =   |ℱ ).
                                                                                                                
          define                                               By Bayes’ Rule, the terms ℐ (  ) are computed recur-
                                                                                           
                                                               sively as
                             
             Δ (  ,   ) = 1 {∑ |   −    | = 1} and     (2)        ℐ   (  ) = ∑ ℐ (  )ℙ (    |  (  ) =   )    (  ,   ),  (8)
                                     
              1
                                 
                            =1                                       +1      ∈           (  )          
                           if    ≠    and Δ (  ,   ) = 1,
                                        1
                                   
                               
             Δ (  ,   ) = {                            (3)     where the state transition probabilities    (  ,   ) are as
              2
                                                                                                      
                        ∅ otherwise .                          discussed in (4).
                                                               Optimal Policy: The optimal sampling policy can be
          Clearly, Δ (  ,   ) assumes value 1 only if    and    differ  obtained by solving the following set of non-linear Dy-
                   1
          in a single position; since disease can spread to only one  namic Programming equations [12],
          more person during two consecutive times, this function
          is 0 if    cannot evolve to    in one single time-step. Δ 2     (ℐ ) = ∑ ‖  ‖ ℐ (  )
                                                                      
                                                                         
                                                                                 1   
          provides us with the node that “transitioned” to dis-              ∈  
          eased state when the system evolved in a unit step from        + min (       (ℐ  ) +   1{   ≠ 0}) ,  (9)
             to   . Thus, the single-step transition probabilities are       ∈[0,  ]    +1    +1
          given as                                                    (ℐ) = ∑ ‖  ‖ ℐ(  ), ∀ℐ ∈ Δ(  ),       (10)
                                                                      
                                                                                 1
                                                                             ∈  
                                       ′
                   (  ,   ) = Δ (  ,   )   ∑    (  , Δ (  ,   )).  (4)  where Δ(  ) denotes simplex on    and ℐ denotes rep-
                           1
                                         
                    
                                            2
                                     ′                                                                
                                                               resentative belief state at time   . Optimal sampling ac-
                                    ∈     
          Objective: Let ℱ ∶= ∪      =1  (  (  ),    (  ), ℓ(  )) be the ob-  tion at time    in state ℐ corresponds to a minimizer
                                                                                        
                           
          servation history of the learner [18]. Then, the policy  of the r.h.s. in the above equation. However, equa-
             is a sampling decision at    on the basis of ℱ   −1 , i.e.,  tions (9), (10) are computationally intractable because
                                                                                                         
             ∶ ℱ   −1  ↦   (  ),    ∈ [1,   ]. Our goal is to find a policy  the number of required computations is   (2 ). Thus,
          that solves the following problem,                   we propose tractable provably approximate solutions
                                                               next.
                               
                   min    (∑ ‖  (  )‖ ) ,              (5)     4.1.2 Hidden Markov Model
                                     1
                              =1
                                                               Since we might not observe the susceptibility graph   ,
                    s.t.    (∑ 1 (  (  ) ≠ 0)) ≤   ,   (6)     we can model it as a hidden Markov process [17]. As-
                           
                              =1                               sume that the system evolves at discrete time-instants
                                                                  = 1, 2, … ,   . Each slot could be of a duration equal
          where ‖⋅‖ denotes the    norm and    is the total testing  to half hour, one hour, or one day. For each time slot
                 1
                              1
          capacity. The instantaneous cost ‖  (  )‖ encourages the    , we use    (  ) to denote the infection state of the   -th
                                            1
          policy to keep the infections as low as possible in an as  individual. In this section we slightly tweak the binary-
                                                                           
          early as possible manner. The capacity constraints are  valued state model that was described earlier, and allow
          2 So this problem is a partially observable MDP (POMDP), which  it to assume 4 values. This allows us to design an algo-
          is non-trivial to solve in general case.             rithm that is more accurate. Thus, we let    (  ) = −1 if
                                                                                                        
                                             © International Telecommunication Union, 2020                    95
   110   111   112   113   114   115   116   117   118   119   120