Page 48 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 48

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020




          modes which are already used in traditional video codecs.  gular prediction or by the DC and planar modes. The an-
          Other components of the surrounding video codec like  gular prediction modes copy the already reconstructed
          block-partitioning or transform and residual-coding are  sample values on the lines left and above of the block
          not altered by our method.                           along a specific direction that is parametrized by an an-
          This paper is organized as follows. In section 2, we de-  gular parameter. Here, for fractional angular positions,
          scribe the general setup for designing data-driven intra-  an interpolation filtering is applied to the reference sam-
          prediction modes. In section 3, we depict their realization  ples. The DC mode generates a constant prediction signal
          by fully connected neural-networks. In section 4, a sim-  that correspondstothemeansample valueofthe adjacent
          plification of the neural-networks via prediction into the  samples, while the planar mode interpolates between a
          transform domain is outlined. MIP is described in section  prediction along the horizontal and the vertical direction.
          5. In the final section 6, some conclusions shall be consid-  In the JEM, an additional post-filtering step, called posi-
          ered.                                                tion dependent prediction combination, PDPC [25], is op-
                                                               tionally applied to the intra-prediction signal.
          2.  DATA-DRIVEN       DESIGN      OF    INTRA-       In our approach to intra-prediction, we tried to design
                                                               n more general intra-prediction modes using data-driven
              PREDICTION MODES
                                                               methods. A priori, it was only assumed that the i-th intra-
          In typical block-based hybrid video codecs, predictive  prediction mode should generate the prediction signal
          coding is used. Thus, when a receiver of a video signal  pred i as
          wants to reconstruct the content of a transmitted video
                                                                                pred i = F i (r; θ i );      (1)
          on a given block, out of information that is already avail-
          able, it generates a prediction signal. This prediction sig-  see Fig. 2 . Here, the function F i is a predefined function
          nal serves as a first approximation of the video signal to  which, however, depends on parameters θ i that are deter-
          be reconstructed. In a second step, a prediction resid-  mined in a training algorithm using a large set of training
          ual is added to generate the reconstructed video signal.  data. Note that when the prediction is used in the final
          This prediction residual needs to be transmitted in the  codec, the parameters θ i are fixed. For their determina-
          bitstream and thus the quality of the prediction signal  tion, we developed a training algorithm that tries to sim-
          greatly influences the compression efficiency.       ulate several aspects of modern video codecs. When exe-
          There are two methods to generate a prediction signal:  cuting it, we applied recent machine learning techniques
          Inter- and intra-picture prediction. In the case of inter-  like [15]. Key parts of our training algorithm are indepen-
          picture prediction, the prediction signal is generated by  dent from the specific form of the prediction function F i .
          motion-compensated prediction where already decoded
          video frames which are different from the current frame
          serve as the input.
          Conversely, in the case of intra-prediction, the prediction
          signal is generated out of already reconstructed sample
          valuesthatbelongtothesameframeandaretypicallyspa-
          tially adjacent to the current block. Thus, as shown in Fig.
          1, input for intra-prediction are the reconstructed sam-
          ples r above and left of a block of samples to be predicted.
                                                               Fig. 2 – Design of intra-prediction modes with fixed function F i and its
                                                               trained parameters θ i . The index i is transmitted.
                                                               A central problem one faces in the above design of more
                                                               flexible intra-prediction modes is their complexity in
                                                               comparison to traditional intra-prediction techniques de-
                                                               scribed above.  The reason is that since the optimal
                                                               form of the intra-prediction modes in (1) is unkown, a
                                                               rather large capacity of the neural-networks is assumed
                                                               by which a larger set of functions can be approximated.
                                                               In the VVC standardization process, the complexity of the
                                                               prediction modes was assessed in two ways. First, the
                                                               complexity to execute the function F i was taken into ac-
                                                               count. This complexity can be measured for example in
                                                               number of multiplications per sample or in terms of de-
          Fig. 1 – intra-prediction on a single block. In principle, all reconstructed
          samples are available.                               coder runtime. Second, the memory requirement, i.e. the
                                                               size of the parameters θ i which need to be stored, turned
          In conventional video codecs like HEVC and also in the  out to be a very important aspect for a complexity evalua-
          JEM, the intra-prediction signal is generated either by an-  tion of the method. In the sequel, intra-prediction modes





          26                                 © International Telecommunication Union, 2020
   43   44   45   46   47   48   49   50   51   52   53