Page 47 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 47

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020







          DATA-DRIVEN INTRA-PREDICTION MODES IN THE DEVELOPMENT OF THE VERSATILE VIDEO
                                                  CODING STANDARD

                                   1
                      1
                                                 1
                                                                1
                                                                                  1
                                                                                              1
          Jonathan Pfaff , Philipp Helle , Philipp Merkle , Michael Schäfer , Björn Stallenberger , Tobias Hinz , Heiko Schwarz 1, 2 ,
                                                         1
                                             Detlev Marpe , Thomas Wiegand 1, 3
                                                   2
                      1 Fraunhofer HHI, Berlin, Germany, FU Berlin, Berlin, Germany, TU Berlin, Berlin, Germany
                                                                            3
          Abstract – In this paper, intra-prediction modes for video coding that were designed using data-driven methods are pre-
          sented. These predictors were incorporated into a test model of the emerging versatile video coding (VVC) standard and yield
          compression benefit over state-of-the-art intra-prediction tools. However, most of the use cases for video coding require severe
          complexity and memory restrictions, in particular at the decoder side. As data-driven methods typically result in predictors
          that are described by a large set of parameters and operations, satisfying these constraints turned out to be a difficult task.
          The purpose of this paper is to outline key steps in the complexity reduction of the trained intra-prediction modes that were
          discussed in the VVC standardization activity. These simplifications finally led to matrix-based intra-prediction (MIP) which
          is part of the current VVC draft international standard.
          Keywords  –  Video Coding, intra-prediction.

          1.  INTRODUCTION                                     of the initially proposed intra-prediction modes which
                                                               mainly target a complexity reduction were developed.
          In recent years, the demand for broadcasting, streaming
                                                               The final variant, called matrix-based intra-prediction
          and storing of video content has significantly increased,
                                                               (MIP) represents a low complexity version. MIP has a
          but memory and transmission capacities are limited re-
                                                               small memory requirement and does not increase the
          sources.  As a consequence, in 2017, a Call for Pro-
                                                               numberofmultiplicationsincomparisontoexistingintra-
          posals (CfP) for new video coding technologies with in-
                                                               prediction modes. It was included into the working draft
          creased compression capabilities compared to state-of-
                                                               5 of the VVC standard at the 14th JVET-meeting in Geneva
          the-art codecs was issued by the Joint Video Experts Team
                                                               in March 2018, [9].
          (JVET), [27].
          One of the responses given to that call was a video codec  Recently, several interesting machine-learning based ap-
          submitted by Fraunhofer HHI, [2, 20]. This codec has a  proaches to image compression have been developed.
          hybrid block based design and includes several advanced  Without aiming at completeness, we mention the work of
          tools. Some of these advanced concepts were contained  Ballé et al., [3], [4], Agustsson, Mentzer et al., [1], [18],
          in the Joint Exploration Model (JEM) developed by the  Minnen et al. [17], Rippel et al. [24], Theis et al. [30]
          JVET [11], while others were newly proposed. Among   and Toderici et al. [31]. In these approaches, image com-
          these newly proposed tools were intra-prediction modes  pression systems were designed which do not use a block-
          that were designed as the outcome of a training experi-  based approach and which do not use intra-prediction in
          ment based on a large set of training data. These intra-  a traditional sense. Rather, they extract several features
          prediction modes provide significant coding gains over  from the input image via a convolutional neural-network.
          state-of-the-art video coding technologies. They are rep-  These features are quantized into symbols and then trans-
          resented by fully connected neural-networks with several  mitted in the bitstream. The decoder reconstructs the im-
          layers.                                              age by a deconvolutional neural-network which is applied
                                                               to the dequantized symbols. Parts of this network might
          After results of the CfP were received, experts of the JVET
                                                               also be used in an arithmetic coding engine to model con-
          collaboratively initiated a standardization process for a
          new video coding standard called versatile video coding  ditional probabilities of coded symbols. The parameters
          (VVC), [19]. Here, the development of a standard which  oftheneural-networksareobtainedonalargesetoftrain-
          enablessubstantialcompressionbenefitscomparedtoex-   ing data.
          isting technologies, in particular within the emerging sce-  In our work, we used machine-learning techniques to de-
          nario of coding UHD or HDR-content, was targeted. In the  velop a compression tool which still fits into a hybrid
          VVC standardization activity, individual coding tools with  block-based architecture. Such an architecture is used
          promising compression performance were investigated  in many existing video codecs like advanced video cod-
          by the JVET within so-called core experiments. Among  ing (AVC) [12, 33] or high efficiency video coding (HEVC)
          these tools were the aforementioned data-driven intra-  [13, 29] and also forms the basis of the emerging VVC [8].
          prediction modes.                                    Within this architecture, our intra predition modes sim-
          In the course of their investigation, several modifications  ply replace or complement the classical intra-prediction





                                             © International Telecommunication Union, 2020                    25
   42   43   44   45   46   47   48   49   50   51   52