Page 48 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 48
ITU Journal: ICT Discoveries, Vol. 3(1), June 2020
modes which are already used in traditional video codecs. gular prediction or by the DC and planar modes. The an-
Other components of the surrounding video codec like gular prediction modes copy the already reconstructed
block-partitioning or transform and residual-coding are sample values on the lines left and above of the block
not altered by our method. along a specific direction that is parametrized by an an-
This paper is organized as follows. In section 2, we de- gular parameter. Here, for fractional angular positions,
scribe the general setup for designing data-driven intra- an interpolation filtering is applied to the reference sam-
prediction modes. In section 3, we depict their realization ples. The DC mode generates a constant prediction signal
by fully connected neural-networks. In section 4, a sim- that correspondstothemeansample valueofthe adjacent
plification of the neural-networks via prediction into the samples, while the planar mode interpolates between a
transform domain is outlined. MIP is described in section prediction along the horizontal and the vertical direction.
5. In the final section 6, some conclusions shall be consid- In the JEM, an additional post-filtering step, called posi-
ered. tion dependent prediction combination, PDPC [25], is op-
tionally applied to the intra-prediction signal.
2. DATA-DRIVEN DESIGN OF INTRA- In our approach to intra-prediction, we tried to design
n more general intra-prediction modes using data-driven
PREDICTION MODES
methods. A priori, it was only assumed that the i-th intra-
In typical block-based hybrid video codecs, predictive prediction mode should generate the prediction signal
coding is used. Thus, when a receiver of a video signal pred i as
wants to reconstruct the content of a transmitted video
pred i = F i (r; θ i ); (1)
on a given block, out of information that is already avail-
able, it generates a prediction signal. This prediction sig- see Fig. 2 . Here, the function F i is a predefined function
nal serves as a first approximation of the video signal to which, however, depends on parameters θ i that are deter-
be reconstructed. In a second step, a prediction resid- mined in a training algorithm using a large set of training
ual is added to generate the reconstructed video signal. data. Note that when the prediction is used in the final
This prediction residual needs to be transmitted in the codec, the parameters θ i are fixed. For their determina-
bitstream and thus the quality of the prediction signal tion, we developed a training algorithm that tries to sim-
greatly influences the compression efficiency. ulate several aspects of modern video codecs. When exe-
There are two methods to generate a prediction signal: cuting it, we applied recent machine learning techniques
Inter- and intra-picture prediction. In the case of inter- like [15]. Key parts of our training algorithm are indepen-
picture prediction, the prediction signal is generated by dent from the specific form of the prediction function F i .
motion-compensated prediction where already decoded
video frames which are different from the current frame
serve as the input.
Conversely, in the case of intra-prediction, the prediction
signal is generated out of already reconstructed sample
valuesthatbelongtothesameframeandaretypicallyspa-
tially adjacent to the current block. Thus, as shown in Fig.
1, input for intra-prediction are the reconstructed sam-
ples r above and left of a block of samples to be predicted.
Fig. 2 – Design of intra-prediction modes with fixed function F i and its
trained parameters θ i . The index i is transmitted.
A central problem one faces in the above design of more
flexible intra-prediction modes is their complexity in
comparison to traditional intra-prediction techniques de-
scribed above. The reason is that since the optimal
form of the intra-prediction modes in (1) is unkown, a
rather large capacity of the neural-networks is assumed
by which a larger set of functions can be approximated.
In the VVC standardization process, the complexity of the
prediction modes was assessed in two ways. First, the
complexity to execute the function F i was taken into ac-
count. This complexity can be measured for example in
number of multiplications per sample or in terms of de-
Fig. 1 – intra-prediction on a single block. In principle, all reconstructed
samples are available. coder runtime. Second, the memory requirement, i.e. the
size of the parameters θ i which need to be stored, turned
In conventional video codecs like HEVC and also in the out to be a very important aspect for a complexity evalua-
JEM, the intra-prediction signal is generated either by an- tion of the method. In the sequel, intra-prediction modes
26 © International Telecommunication Union, 2020