Page 52 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 52

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020




          of 6 offset vectors of size 64. Matrices and offset vectors  The MIP-tool gave a compression benefit of −0.79% luma
          of that set or parts of these matrices and offset vectors are  BD-rate gain, [23, Table 1]. The measured decoder run-
          used for all other block shapes.                     time was 99% which means that MIP did not cause any
          In the third step, at the sample positions that were left  decoder runtime overhead. The measured encoder run-
          out in the generation of pred red , the final prediction signal  time overhead was 138%. As for the other variants of
          arises by linear interpolation from pred red . This linear in-  our data-driven intra-prediction modes, different trade-
          terpolation is not needed if W = H = 4. To describe it,  offs between compression performance and encoder run-
          assume without loss of generality that W ≥ H. One ex-  time overhead are possible and were developed subse-
          tends the prediction signal to the top by the reconstructed  quently. In this paper, the complexity issue is mainly con-
          values and writes pred red [x][−1] for the first line. Then  sidered from a decoder perspective. A software reference
          the signal pred ups,ver  on a block of width W red and height  for the current version of MIP can be found in the docu-
                      red
          2 ∗ H red is given as                                ment [32].
                                                               After its adoption into VVC, several further modifications
                   ups,ver
               pred      [x][2y + 1] =pred red [x][y]
                   red                                         were performed for the final design of MIP in the current
                                    1
                       ups,ver                                 VVC draft international standard [10]. Most importantly,
                   pred     [x][2y] = (pred red [x][y − 1]
                       red          2                          all matrix coefficients of the involved matrices are repre-
                                  +pred red [x][y])            sented by 8-bit integers and the offset vectors b i from (3)
                                                               are set to zero. For an efficient 8-bit implementation, the
                                                 k
          The latter process is carried out k times until 2 · H red =
                                                               matrix-vector multiplication A i · bdry red  from (3) is re-
          H. Next, a horizontal up-sampling operation is applied
                                                               placed by the matrix-vector multiplication
          to the result of the vertical up-sampling. The latter up-
          sampling operation uses the full boundary left of the pre-         A i · y red + bdry red [0] · 1,  (4)
                                                                             e
          diction signal; see Fig. 8.
                                                               where the vector y red is defined by
                                                                       y red [0] = bdry red [0] − 2 B−1 ,
                                                                       y red [i] = bdry red [i] − bdry red [0], i > 0.
                                                               Here, 1 denotes the vector of ones and B denotes the bit-
                                                               depth. Since the entries of y red are typically smaller than
                                                               the entries of bdry red , this modification of the matrix-
          Fig. 8 – The final interpolation step for an 8 × 8-block. The second up-  vector multiplication leads to a smaller impact of the ap-
          sampling operation uses the full boundary.           proximation error that arises when one passes from the
                                                               trained floating point matrices to the 8-bit integer matri-
          For each prediction mode generated by A i , b i ∈ S 0/1/2
                                                               ces. The result of the matrix-vector multiplication (4) is
          with i > 0, also the transposed prediction mode is sup-
                                                   top         right-shifted by 6 to generate the final prediction signal.
          ported. This means that one interchanges bdry  and
                                                   red         The constant right-shift 6 was achieved by smoothly re-
             left
          bdry  , computes the matrix vector product and the off-
             red                                               stricting the dynamic range of the matrix-entries already
          set addition as before and then interchanges the x and
                                                               during the training process. Also, several non-normative
          the y coordinate in the resulting reduced prediction sig-
                                                               encoder-speedups for MIP were included into the refer-
          nal. The up-sampling step is then carried out as before.
                                                               ence software.
          As a consequence, for blocks of size 4 × 4, a total number
          of 35 MIP modes is supported. For blocks of size 8 × 4,  6.  CONCLUSION
          4 × 8 and 8 × 8, a total number of 19 MIP modes is sup-
          ported. For all other block shapes, a total number of 11  In this paper, several variants of data-driven intra-
          MIP modes is supported.                              prediction modes were presented. Such modes can im-
          The MIP-prediction mode was signalled using a most   prove the compression efficiency of state-of-the-art video
          probable mode scheme that is based on intra-prediction  codecs. However, a standard like the emerging versatile
          modes of neighboring blocks, similar to the well-known  video coding is targeted to both enable high compression
          signalization of conventional intra-prediction modes. The  rates and to be implementable on multiple types of con-
          neural-network that predicts the conditional probability  sumer devices at moderate complexity and costs. The
          of an intra-prediction mode out of neighboring recon-  latter requirement forms a particular challenge for the
          structed samples was removed for complexity reasons.  presented approach since, a priori, the resulting intra-
          In order to determine the matrices of the MIP-prediction  prediction modes are much less structured than conven-
          modes, a training algorithm similar to the algorithm out-  tional ones and thus require a lot of parameters to be
          lined in section 3 was used. Here, the constraints given  stored. As a consequence, architectural constraints that
          by the input down-sampling, the output up-sampling and  reflect some well-known image processing methods were
          the sharing of the predictors across different block shapes  invoked into the training and design of the predictors.
          were incorporated into the training algorithm.       In particular, sparsification in the transform domain and





           30                                © International Telecommunication Union, 2020
   47   48   49   50   51   52   53   54   55   56   57