Page 115 - ITU Journal, Future and evolving technologies - Volume 1 (2020), Issue 1, Inaugural issue
P. 115
ITU Journal on Future and Evolving Technologies, Volume 1 (2020), Issue 1
2
vector ( ) is never fully revealed to the learner . crucial because not many testing kits are available dur-
Test and Quarantine: At each time ∈ [1, ], the ing epidemics. An alternative, somewhat equivalent and
learner has a unit budget to choose an individual ∈ simpler objective is to remove the capacity constraints
[1, ] in order to “sample” (test for infection). Sam- altogether and include a cost for using testing kits,
pling an individual at reveals the state ( ). We
let ( ) ∈ [0, ] denote the sampling decision at time min (∑ ‖ ( )‖ + 1 ( ( ) ≠ 0)) , (7)
. In case no one is sampled at , we let ( ) = 0. We 1
=1
let ( ) denote the test result at time ; ( ) = +1
means the person tested positive, ( ) = −1 means the where > 0. We now briefly discuss how to solve the
test was negative, and ( ) = 0 means that the individ- above discussed MDP. Theoretical results on the ex-
ual was not tested at time . The vector comprising of istence of optimal policies, and methods to solve con-
observations ( ) is denoted ( ). strained MDPs, or POMDPs can be found in [4, 21, 22].
If sampled individuals are found to be infected, then In case the parameters describing the environment are
they are isolated, i.e., kept out of the population, and unknown, we can use machine learning techniques de-
hence cannot spread the disease to their neighbors. We veloped in [20].
let ( ) denote the set of those individuals who are iso- Belief State MDP: We introduce a belief state, which is
lated at time . a posterior distribution over the state space . This
State Transition: Let us now look at the controlled transforms the POMDP to a continuous-state MDP
transition probabilities of the controlled Markov process on the belief state. We denote the belief state by
( ). We first introduce some notations. For , ∈ , ℐ( ) = {ℐ( , )} ∈ , where ℐ( , ) ∶= ℙ ( ( ) = |ℱ ).
define By Bayes’ Rule, the terms ℐ ( ) are computed recur-
sively as
Δ ( , ) = 1 {∑ | − | = 1} and (2) ℐ ( ) = ∑ ℐ ( )ℙ ( | ( ) = ) ( , ), (8)
1
=1 +1 ∈ ( )
if ≠ and Δ ( , ) = 1,
1
Δ ( , ) = { (3) where the state transition probabilities ( , ) are as
2
∅ otherwise . discussed in (4).
Optimal Policy: The optimal sampling policy can be
Clearly, Δ ( , ) assumes value 1 only if and differ obtained by solving the following set of non-linear Dy-
1
in a single position; since disease can spread to only one namic Programming equations [12],
more person during two consecutive times, this function
is 0 if cannot evolve to in one single time-step. Δ 2 (ℐ ) = ∑ ‖ ‖ ℐ ( )
1
provides us with the node that “transitioned” to dis- ∈
eased state when the system evolved in a unit step from + min ( (ℐ ) + 1{ ≠ 0}) , (9)
to . Thus, the single-step transition probabilities are ∈[0, ] +1 +1
given as (ℐ) = ∑ ‖ ‖ ℐ( ), ∀ℐ ∈ Δ( ), (10)
1
∈
′
( , ) = Δ ( , ) ∑ ( , Δ ( , )). (4) where Δ( ) denotes simplex on and ℐ denotes rep-
1
2
′
resentative belief state at time . Optimal sampling ac-
∈
Objective: Let ℱ ∶= ∪ =1 ( ( ), ( ), ℓ( )) be the ob- tion at time in state ℐ corresponds to a minimizer
servation history of the learner [18]. Then, the policy of the r.h.s. in the above equation. However, equa-
is a sampling decision at on the basis of ℱ −1 , i.e., tions (9), (10) are computationally intractable because
∶ ℱ −1 ↦ ( ), ∈ [1, ]. Our goal is to find a policy the number of required computations is (2 ). Thus,
that solves the following problem, we propose tractable provably approximate solutions
next.
min (∑ ‖ ( )‖ ) , (5) 4.1.2 Hidden Markov Model
1
=1
Since we might not observe the susceptibility graph ,
s.t. (∑ 1 ( ( ) ≠ 0)) ≤ , (6) we can model it as a hidden Markov process [17]. As-
=1 sume that the system evolves at discrete time-instants
= 1, 2, … , . Each slot could be of a duration equal
where ‖⋅‖ denotes the norm and is the total testing to half hour, one hour, or one day. For each time slot
1
1
capacity. The instantaneous cost ‖ ( )‖ encourages the , we use ( ) to denote the infection state of the -th
1
policy to keep the infections as low as possible in an as individual. In this section we slightly tweak the binary-
early as possible manner. The capacity constraints are valued state model that was described earlier, and allow
2 So this problem is a partially observable MDP (POMDP), which it to assume 4 values. This allows us to design an algo-
is non-trivial to solve in general case. rithm that is more accurate. Thus, we let ( ) = −1 if
© International Telecommunication Union, 2020 95