Page 85 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 7

Page 85 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 7 – Terahertz communications

P. 85

ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 7

In each time slot , the transmitter selects beam codes for Algorithm 2 HAB Procedure in SC‑FDMA
each subcarrier from the hierarchical k‑means codebook. Input: , , ,
2
1
At the receiver, the receive power for the th subcarrier Output: b ∗

is measured and fed back to the transmitter. At the end of 1: Initialization: Set T = {(1, 1)}, = = +∞

time slot , the transmitter obtains the measured rewards and = (1, 1); 2,1 2,2

of all subcarriers for this time slot and decides which arm 2: while ≠ −1 && < do

to select for which subcarrier based on a speci ic rule for 3: ℎ = 1, = 1 and ℙ = {(ℎ, )};
the next time slot +1. In the NLOS scenario, the expected 4: New Node Selection
cumulative regrets ( ) for th subcarrier is given by 5: while (ℎ, ) ∈ T do

6: Select new node (ℎ, ) based on (36);

∗
( ) = ∑ ( (w ) − (w ( ))) . (35) 7: ℙ = ℙ ∪ (ℎ, );

=1 8: end while
9: Selected new code: ( , ) = (ℎ, );

∗
where w is the optimal beam code for the th subcar‑ 10: Decision Tree Update: T +1 = T ∪ ( , );

rier. The objective of the algorithm design for the NLOS 11: Measure the reward ;
case is to ind a selection policy that minimizes the sum 12: Attribute Update

expected cumulative regret ( ) over all subcarri‑ 13: for all (ℎ, ) ∈ T do
0 + −1 14: if (ℎ, ) ∈ ℙ then
ers, i.e., ( ) = ∑ ( ).

15: update ℎ, ( ), ℎ, ( ) and ℎ, ( ) with (37),
= 0
(38) and (39), respectively;
4.2 HBA procedure 16: end if
17: update ℎ, ( ) with (39);
In the following, we extend the HBA algorithm proposed
18: end for
in [9] to the considered LOS and NLOS SC‑FDMA trans‑
19: ( ) = ( ) = +∞;
mission. As mentioned, the BA problems for both LOS +1,2 −1 +1,2 −1
20: for all (ℎ, ) ∈ T do
and NLOS scenarios, respectively, can be decomposed
21: update ( ) with (40);
into several independent sub‑BA problems, which can be ℎ,
22: end for
solved in parallel. The HBA algorithm discussed in the fol‑
23: end while
lowing provides an ef icient approach to solve one inde‑
pendent BA problem, such as a sub‑band BA problem in New node selection
the LOS case and a subcarrier BA problem in the NLOS In the new node selection phase, the HBA will select the
case, respectively.
beam code w with maximal estimated mean reward—Q‑
The procedure of the HBA algorithm is shown in Algo‑
value for one subcarrier or sub‑band. The procedure of
rithm 2. The algorithm is designed based on the hierar‑
node selection can be found from line 3 to line 10 in Al‑
chical structure of the average payoff function with re‑
gorithm 2. By exploiting the hierarchical codebook struc‑
spect to the code words. For a single BA problem, con‑
ture, a binary tree search is implemented to ind the new
sider the beam code w( , ) in the hierarchical codebook.
node. The binary search tree T is initialized in the begin‑
If w( , ) performs well, its left child w( + 1, 2 − 1) and
ning. A node in T is represented by (ℎ, ), where ℎ is the
right child w( + 1, 2 ) are highly likely to perform well, ℎ−1
depth from the root node and , 1 ≤ ≤ 2 is the index
too. Hence, the algorithm will explore intensively within
at depth ℎ. The corresponding beam code of node (ℎ, )
the beam coverage (w( , )) with a good reward and
1
is w(ℎ, ). After initialization, T contains only the root
loosely in others. For this proposal, a search tree is gene-

(1, 1). Assume T is the search tree at time . Starting
rated with nodes associated with the beam coverage,
from the root node, the Q‑values of two child nodes are
independently for each sub‑BA problem. The node in a
compared until a new node ( , ) ∉ T is selected. For
higher layer covers a smaller region as discussed in the
a node (ℎ, ), the selection criterion is to choose its child
codebook design part. The algorithm operates in discrete ∗
(ℎ + 1, ) with largest Q‑value, that is:
time slots, and in each time slot , a new beam code is
chosen by a deterministic rule based on the attributes of
⎧ 2 , ( ) > ( )
the search tree. The code selection procedure is executed ∗ { ℎ+1,2 ℎ+1,2 −1
= 2 − 1, ℎ+1,2 ( ) < ℎ+1,2 −1 ( )
independently for each sub‑band or subcarrier in the ⎨ 2 − Ber(0.5), ( ) = ( )
{
LOS or NLOS scenario, respectively. After measurement ⎩ ℎ+1,2 ℎ+1,2 −1
at the receiver, the observed rewards will be fed back to (36)
the transmitter. Afterwards, the transmitter updates the where Ber(0.5) represents a Bernoulli distributed ran‑
attributes of the entire search tree based on the newly dom variable with a parameter of 0.5. All selected nodes
during the compare‑select procedure are saved in ,
observed rewards and the newly selected code is
added to the search tree. By repeating this selection‑ which is the path from the root node to the selected node.

update process, the HBA algorithm narrows the sear- The inally selected node ( , ) is added to the search

ching region intelligently until a close‑to‑optimal beam tree T to obtain the search tree T +1 for the next time slot

80 81 82 83 84 85 86 87 88 89 90