Page 51 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 51

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020




          all MIP-prediction modes is equal to 8 Kilobyte. This cor-  In the case of a 4 × 4-block, for 0 ≤ i < 2, one defines
          responds to a memory reduction by a factor 1000 and by a        top        top        top
          factor 100 in comparison to the methods of section 3 and    bdry red [i] = bdry  [2i] + bdry  [2i + 1].
          section 4, respectively. The key idea to achieve the afore-
                                                               In all other cases, if the block-width W is given as W =
          mentioned two complexity constraints is to use down-     n
                                                               4 · 2 , for 0 ≤ i < 4 one defines
          sampling and up-sampling operations in the domain of
          the prediction input and output.                                        1  2 −1
                                                                                     n
                                                                                     X
                                                                                                n
          For predicting the samples of a W ×H-block, W and H in-      bdry top [i] =    bdry top [2 · i + j].
                                                                           red     n
          teger powers of two between 4 and 64, MIP takes one line                2  j=0
          of H and W reconstructed neighboring boundary sam-
          plesleftandabovetheblockasinput. Then, theprediction  The reduced left boundary bdry left  is defined analo-
                                                                                            red
          signal is generated using the following three steps which  gously. The two boundaries bdry top  and bdry left  are con-
                                                                                           red       red
          are also summarized in Figure 6:                     catenated to form the reduced boundary
                                                                                         left
           1. From the boundary samples, four samples in the case          bdry red = [bdry red  , bdry top ];  (2)
                                                                                                red
             W = H = 4 and eight samples, else, are extracted by
             averaging.                                        see Fig. 7. It has size 4 for 4 × 4 blocks and size 8, else-
                                                               where.
           2. A matrix-vector multiplication, followed by addition
             of an offset, is carried out with the averaged samples
             as an input. The result is a reduced prediction signal
             on a subsampled set of samples in the original block.
           3. The prediction signal at the remaining positions is
             generated from the prediction signal on the subsam-
             pled set by linear interpolation.



                                                               Fig. 7 – The averaging step for an 8 × 8-block. This results in four sam-
                                                               ples (two in the case of 4 × 4-blocks) along each axis.

                                                               In the second step, out of the reduced input vector bdry red
                                                               one generates a reduced prediction signal pred red . The
                                                               lattersignalisasignalonthedownsampledblockofwidth
                                                               W red and height H red . Here, W red and H red are defined
          Fig.6–Theflowchartofmatrix-basedintra-predictionforW×H-block.
                                                               as:
          The averaging step on the boundary, which is performed
                                                                  W red = 4, H red = 4; if W = H = 4
          for all MIP-modes, could be interpreted as a low complex-
          ity version of the joint feature extraction that was part  W red = min(W, 8), H red = min(H, 8); elsewhere.
          of the neural-network-based intra-prediction; see section
                                                               The reduced prediction signal pred red of the i-th predic-
          3. Moreover, one could rephrase the linear interpolation
                                                               tion mode is computed by calculating a matrix vector-
          step by saying that each MIP-mode predicts into the trans-
                                                               product and adding an offset:
          form domain of the (5, 3)-wavelet transform, where only
          low subbands are predicted to be non-zero. Thus, con-
                                                                            pred red = A i · bdry red + b i .  (3)
          ceptionally, this part of MIP-prediction is similar to the
          prediction into the DCT-domain described in the previous  Here, A i is a matrix that has W red · H red rows and 4
          section 3. However, note that for the predictors predict-  columns if W = H = 4 and 8 columns in all other cases.
          ing into the DCT-domain, not all high frequency compo-  Moreover, b is a vector of size W red · H red .
          nents of the prediction signal were set to zero but rather a  The matrices and offset vectors needed to generate the
          more flexible sparsity pattern was used whereas the MIP-  prediction signal are taken from three sets S 0 , S 1 , S 2 . The
          predictors are constrained to generate only low-pass sig-  set S 0 consists of 18 matrices each of which has 16 rows
          nals.                                                and 4 columns and 18 offset vectors of size 16. Matrices
          We now describe each of the three steps in the MIP pre-  and offset vectors of that set are used for blocks of size
          diction in more detail. In the first step, the left and  4×4. The set S 1 consists of 10 matrices , each of which has
          top input boundaries bdry  top  and bdry left  are reduced to  16 rows and 8 columns and 10 offset vectors of size 16.
                               top        left         top
          smaller boundaries bdry  and bdry  . Here, bdry
                               red       red           red     Matrices and offset vectors of that set are used for blocks
                 left
          and bdry   both consists of 2 samples in the case of a  of sizes 4×8, 8×4 and 8×8. Finally, the set S 2 consists of
                 red
          4×4-block and both consist of 4 samples in all other cases.  6 matrices , each of which has 64 rows and 8 columns and


                                             © International Telecommunication Union, 2020                    29
   46   47   48   49   50   51   52   53   54   55   56