Page 47 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 47
ITU Journal: ICT Discoveries, Vol. 3(1), June 2020
DATA-DRIVEN INTRA-PREDICTION MODES IN THE DEVELOPMENT OF THE VERSATILE VIDEO
CODING STANDARD
1
1
1
1
1
1
Jonathan Pfaff , Philipp Helle , Philipp Merkle , Michael Schäfer , Björn Stallenberger , Tobias Hinz , Heiko Schwarz 1, 2 ,
1
Detlev Marpe , Thomas Wiegand 1, 3
2
1 Fraunhofer HHI, Berlin, Germany, FU Berlin, Berlin, Germany, TU Berlin, Berlin, Germany
3
Abstract – In this paper, intra-prediction modes for video coding that were designed using data-driven methods are pre-
sented. These predictors were incorporated into a test model of the emerging versatile video coding (VVC) standard and yield
compression benefit over state-of-the-art intra-prediction tools. However, most of the use cases for video coding require severe
complexity and memory restrictions, in particular at the decoder side. As data-driven methods typically result in predictors
that are described by a large set of parameters and operations, satisfying these constraints turned out to be a difficult task.
The purpose of this paper is to outline key steps in the complexity reduction of the trained intra-prediction modes that were
discussed in the VVC standardization activity. These simplifications finally led to matrix-based intra-prediction (MIP) which
is part of the current VVC draft international standard.
Keywords – Video Coding, intra-prediction.
1. INTRODUCTION of the initially proposed intra-prediction modes which
mainly target a complexity reduction were developed.
In recent years, the demand for broadcasting, streaming
The final variant, called matrix-based intra-prediction
and storing of video content has significantly increased,
(MIP) represents a low complexity version. MIP has a
but memory and transmission capacities are limited re-
small memory requirement and does not increase the
sources. As a consequence, in 2017, a Call for Pro-
numberofmultiplicationsincomparisontoexistingintra-
posals (CfP) for new video coding technologies with in-
prediction modes. It was included into the working draft
creased compression capabilities compared to state-of-
5 of the VVC standard at the 14th JVET-meeting in Geneva
the-art codecs was issued by the Joint Video Experts Team
in March 2018, [9].
(JVET), [27].
One of the responses given to that call was a video codec Recently, several interesting machine-learning based ap-
submitted by Fraunhofer HHI, [2, 20]. This codec has a proaches to image compression have been developed.
hybrid block based design and includes several advanced Without aiming at completeness, we mention the work of
tools. Some of these advanced concepts were contained Ballé et al., [3], [4], Agustsson, Mentzer et al., [1], [18],
in the Joint Exploration Model (JEM) developed by the Minnen et al. [17], Rippel et al. [24], Theis et al. [30]
JVET [11], while others were newly proposed. Among and Toderici et al. [31]. In these approaches, image com-
these newly proposed tools were intra-prediction modes pression systems were designed which do not use a block-
that were designed as the outcome of a training experi- based approach and which do not use intra-prediction in
ment based on a large set of training data. These intra- a traditional sense. Rather, they extract several features
prediction modes provide significant coding gains over from the input image via a convolutional neural-network.
state-of-the-art video coding technologies. They are rep- These features are quantized into symbols and then trans-
resented by fully connected neural-networks with several mitted in the bitstream. The decoder reconstructs the im-
layers. age by a deconvolutional neural-network which is applied
to the dequantized symbols. Parts of this network might
After results of the CfP were received, experts of the JVET
also be used in an arithmetic coding engine to model con-
collaboratively initiated a standardization process for a
new video coding standard called versatile video coding ditional probabilities of coded symbols. The parameters
(VVC), [19]. Here, the development of a standard which oftheneural-networksareobtainedonalargesetoftrain-
enablessubstantialcompressionbenefitscomparedtoex- ing data.
isting technologies, in particular within the emerging sce- In our work, we used machine-learning techniques to de-
nario of coding UHD or HDR-content, was targeted. In the velop a compression tool which still fits into a hybrid
VVC standardization activity, individual coding tools with block-based architecture. Such an architecture is used
promising compression performance were investigated in many existing video codecs like advanced video cod-
by the JVET within so-called core experiments. Among ing (AVC) [12, 33] or high efficiency video coding (HEVC)
these tools were the aforementioned data-driven intra- [13, 29] and also forms the basis of the emerging VVC [8].
prediction modes. Within this architecture, our intra predition modes sim-
In the course of their investigation, several modifications ply replace or complement the classical intra-prediction
© International Telecommunication Union, 2020 25