Page 50 - ITU Journal Future and evolving technologies Volume 2 (2021), Issue 6 – Wireless communication systems in beyond 5G era
P. 50

ITU Journal on Future and Evolving Technologies, Volume 2 (2021), Issue 6




          on the training SNR, we divide the target range of forward  The training strategy for the encoder’s codeword and
          SNRs into (small) non‑overlapping intervals and select a  symbol power levels has been optimized empirically. The
          single training SNR within each interval.            levels are initialized to unit value, and kept constant for
          Encoder and decoder are implemented as two separate  a given number of epochs as early start of training pro‑
          DNNs whose coef icients are determined through a joint  duces codes with poor performance. On the other hand,
          training procedure. The training procedure consists in  if training of levels is started too late, they remain close to
          the transmission of batches of randomly generated mes‑  their initial unit value, and therefore produce no bene its.
                                              4
          sages. The number of batches is 2 × 10 , where each  It has been found empirically that starting to train code‑
                             3
          batch contains 2 × 10 messages. DNN coef icients are  word power levels at epoch 100 and symbol power levels
          updated by an Adaptive Moment (ADAM) estimation opti‑  at epoch 200 provides the best results.
          mizer based on the Binary Cross‑Entropy (BCE) loss func‑  As suggested in [1], it may be bene icial to perform train‑
          tion. For each batch, a loss value is obtained by computing  ing with longer messages compared to link level evalu‑
          the BCE between the messages in that batch and the cor‑  ation as training with short messages does not produce
          responding decoder outputs. The learning rate is initially  good codes. According to our observations, training with
          set to 0.02 and divided by 10 after the  irst group of 10 3  longer messages, twice the length of LLS messages, is ben‑
          batches. The gradient magnitude is clipped to 1.     e icial. However, according to our observations, the ben‑
          By monitoring the BCE loss value throughout the entire  e it of using longer messages vanishes when training with
          training session, we noticed that the loss trajectory has  larger batches. Therefore, in our evaluations the length of
          high peaks which appear more frequently during the ini‑  training messages and LLS messages is the same.
          tialphasesof training. Thosepeaksindicatethatthetrain‑  The above training method produces codes with better
          ing process is driving the encoder/decoder NNs away  performance compared to the method of [1], as the per‑
          from their optimal performances. In order to mitigate the  formance evaluations of Section 4 will show. Training pa‑
          detrimental effect of the above events, the following coun‑  rameters are summarized in Table 3.
          termeasures have been taken:
                                                               4.   PERFORMANCE EVALUATIONS
            • usage of a larger batch size, 10 times larger than [1].
                                                  1
             Usage of large batches stabilizes training and ac‑  In this section, we assess the BLER performance of DEF
             celerates convergence of NN weights towards values  codes and compare their performance with the perfor‑
             that produce good performance;                    mance of the NR LDPC code reported in [12] and the per‑
                                                               formance of Deepcode [1] for the same Spectral Ef iciency
            • implementation of a training roll‑back mechanism  (SE). The SE is de ined as the ratio of the number of infor‑
             that discards the NN weight updates of the last epoch  mation bits    over the number of forward‑channel time‑
             if the loss value produced by the NNs with updated  frequency resources used for transmission of the corre‑
             weights is at least 10 times larger than the loss pro‑  sponding codeword. As each time‑frequency resource
             duced by the NNs with previous weights.           carries a complex symbol, and since each complex symbol
                                                               is produced by combining two consecutive real symbols,
          As we observed that the outcome of training is sensitive to  we have
          the random number generators’ initialization, each train‑                   
          ing is repeated three times with different initialization               ≜  1 +     [bits/s/Hz].   (32)
          seeds. For each repetition, we record the  inal NN weights
          and the NN weights that produced the smallest loss dur‑  The forward‑channel and feedback‑channel impairments
          ing training. After training, Link‑Level Simulations (LLS)  are modeled as AWGN with variance     2     =  1/         and
          are performed using all the recorded weights. The set of      2        =  1/              ,  respectively.  The  training  forward
          weights that provides the lowest Block Error Rate (BLER)  SNR  and  LLS  forward  SNR  are  the  same;  the  feedback
          is kept and the others are discarded.                channel is noiseless.
          As described in Subsection 2.3 and illustrated in Fig. 2, the  The set of parameters used in the performance evaluations
          PSG output is normalized so that each coded symbol has  is shown in Table 4. For DEF code performance evaluations,
          zero mean and unit variance. During NN training, normal‑  we  show  that  even  the  shortest  feedback  extensions  –
          ization subtracts the batch mean from the PSG output and  corresponding to the    and    parameters of Table 4
          divides the result of subtraction by the batch standard de‑
          viation. After training, encoder calibration is performed in        Table 3 – Training parameters.
          order to compute the mean and the variance of the RNN            Training parameter             Value
          outputs over a given number of codewords. Calibration
                       6
          is done over 10 codewords in the simulations here re‑              Number of epochs             2000
          ported. In LLS, normalization is done using the mean and       Number of batches per epoch       10
          variance values computed during calibration.                  Number of codewords per batch     2000
          1 By training stabilization we mean that the loss function produces  Training message length [bits]  50
          smoother trajectories during training.
                                                                 Starting epoch for codeword‑level weights training  100
                                                                  Starting epoch for symbol‑level weights training  200


          38                                 © International Telecommunication Union, 2021
   45   46   47   48   49   50   51   52   53   54   55