Page 52 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media

P. 52

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020

of 6 offset vectors of size 64. Matrices and offset vectors The MIP-tool gave a compression benefit of −0.79% luma
of that set or parts of these matrices and offset vectors are BD-rate gain, [23, Table 1]. The measured decoder run-
used for all other block shapes. time was 99% which means that MIP did not cause any
In the third step, at the sample positions that were left decoder runtime overhead. The measured encoder run-
out in the generation of pred red , the final prediction signal time overhead was 138%. As for the other variants of
arises by linear interpolation from pred red . This linear in- our data-driven intra-prediction modes, different trade-
terpolation is not needed if W = H = 4. To describe it, offs between compression performance and encoder run-
assume without loss of generality that W ≥ H. One ex- time overhead are possible and were developed subse-
tends the prediction signal to the top by the reconstructed quently. In this paper, the complexity issue is mainly con-
values and writes pred red [x][−1] for the first line. Then sidered from a decoder perspective. A software reference
the signal pred ups,ver on a block of width W red and height for the current version of MIP can be found in the docu-
red
2 ∗ H red is given as ment [32].
After its adoption into VVC, several further modifications
ups,ver
pred [x][2y + 1] =pred red [x][y]
red were performed for the final design of MIP in the current
1
ups,ver VVC draft international standard [10]. Most importantly,
pred [x][2y] = (pred red [x][y − 1]
red 2 all matrix coefficients of the involved matrices are repre-
+pred red [x][y]) sented by 8-bit integers and the offset vectors b i from (3)
are set to zero. For an efficient 8-bit implementation, the
k
The latter process is carried out k times until 2 · H red =
matrix-vector multiplication A i · bdry red from (3) is re-
H. Next, a horizontal up-sampling operation is applied
placed by the matrix-vector multiplication
to the result of the vertical up-sampling. The latter up-
sampling operation uses the full boundary left of the pre- A i · y red + bdry red [0] · 1, (4)
e
diction signal; see Fig. 8.
where the vector y red is defined by
y red [0] = bdry red [0] − 2 B−1 ,
y red [i] = bdry red [i] − bdry red [0], i > 0.
Here, 1 denotes the vector of ones and B denotes the bit-
depth. Since the entries of y red are typically smaller than
the entries of bdry red , this modification of the matrix-
Fig. 8 – The final interpolation step for an 8 × 8-block. The second up- vector multiplication leads to a smaller impact of the ap-
sampling operation uses the full boundary. proximation error that arises when one passes from the
trained floating point matrices to the 8-bit integer matri-
For each prediction mode generated by A i , b i ∈ S 0/1/2
ces. The result of the matrix-vector multiplication (4) is
with i > 0, also the transposed prediction mode is sup-
top right-shifted by 6 to generate the final prediction signal.
ported. This means that one interchanges bdry and
red The constant right-shift 6 was achieved by smoothly re-
left
bdry , computes the matrix vector product and the off-
red stricting the dynamic range of the matrix-entries already
set addition as before and then interchanges the x and
during the training process. Also, several non-normative
the y coordinate in the resulting reduced prediction sig-
encoder-speedups for MIP were included into the refer-
nal. The up-sampling step is then carried out as before.
ence software.
As a consequence, for blocks of size 4 × 4, a total number
of 35 MIP modes is supported. For blocks of size 8 × 4, 6. CONCLUSION
4 × 8 and 8 × 8, a total number of 19 MIP modes is sup-
ported. For all other block shapes, a total number of 11 In this paper, several variants of data-driven intra-
MIP modes is supported. prediction modes were presented. Such modes can im-
The MIP-prediction mode was signalled using a most prove the compression efficiency of state-of-the-art video
probable mode scheme that is based on intra-prediction codecs. However, a standard like the emerging versatile
modes of neighboring blocks, similar to the well-known video coding is targeted to both enable high compression
signalization of conventional intra-prediction modes. The rates and to be implementable on multiple types of con-
neural-network that predicts the conditional probability sumer devices at moderate complexity and costs. The
of an intra-prediction mode out of neighboring recon- latter requirement forms a particular challenge for the
structed samples was removed for complexity reasons. presented approach since, a priori, the resulting intra-
In order to determine the matrices of the MIP-prediction prediction modes are much less structured than conven-
modes, a training algorithm similar to the algorithm out- tional ones and thus require a lot of parameters to be
lined in section 3 was used. Here, the constraints given stored. As a consequence, architectural constraints that
by the input down-sampling, the output up-sampling and reflect some well-known image processing methods were
the sharing of the predictors across different block shapes invoked into the training and design of the predictors.
were incorporated into the training algorithm. In particular, sparsification in the transform domain and

30 © International Telecommunication Union, 2020

47 48 49 50 51 52 53 54 55 56 57