Page 87 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 87

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020





             A STUDY OF THE EXTENDED PERCEPTUALLY WEIGHTED PEAK SIGNAL-TO-NOISE RATIO
             (XPSNR) FOR VIDEO COMPRESSION WITH DIFFERENT RESOLUTIONS AND BIT DEPTHS

                                                    1
                                                                  1, 2
                   Christian R. Helmrich , Sebastian Bosse , Heiko Schwarz , Detlev Marpe , and Thomas Wiegand
                                     1
                                                                                                    1, 3
                                                                                1
                   1  Video Coding and Analytics Department, Fraunhofer Heinrich Hertz Institute (HHI), Berlin, Germany
                                 2  Institute for Computer Science, Free University of Berlin, Germany
                                3  Image Communication Group, Technical University of Berlin, Germany
          Abstract – Fast and accurate estimation of the visual quality of compressed video content, particularly for
          quality-of-experience (QoE) monitoring in video broadcasting and streaming, has become important. Given

          the relatively poor performance of the well-known peak signal-to-noise ratio (PSNR) for such tasks, several

          video quality assessment (VQA) methods have been developed. In this study, the authors’ own recent work
          on an extension of the perceptually weighted PSNR, termed XPSNR, is analyzed in terms of its suitability for
          objectively predicting the subjective quality of videos with different resolutions (up to UHD) and bit depths

          (up to 10 bits/sample). Performance evaluations on various subjective-MOS annotated video databases and
          investigations of the computational complexity in comparison with state-of-the-art VQA solutions like VMAF

          and (MS-)SSIM confirm the merit of the XPSNR approach. The use of XPSNR as a reference model for visually

          motivated control of the bit allocation in modern video encoders for, e. g., HEVC and VVC is outlined as well.
          Keywords – Data compression, HD, HEVC, PSNR, QoE, SSIM, UHD, video coding, VMAF, VQA, VVC, WPSNR


          1.  INTRODUCTION                                     Given the well-known inaccuracy of the peak signal-to-
                                                               noise ratio (PSNR) in predicting an average subjective
          The consumption of compressed digital video content
                                                               judgment of perceptual coding quality [2] for a specific
          via over-the-air broadcasting or Internet Protocol (IP)
                                                               codec (coder-decoder) c and image or video stimulus
          based streaming services is steadily increasing. This, in   (or simply, signal) s, various better performing models
          turn, leads to a rapid increase in the amount of content
                                                               have been devised over the last years. The most widely
          distributed using these services. Thus, it is desirable to
                                                               employed are the structural similarity measure (SSIM)
          make use of schemes for automated monitoring of the   [3] and its multiscale extension, MS-SSIM [4], as well as
          instantaneous fidelity of the distributed audio-visual   a more recently proposed video multimethod assess-
          signals in order to maintain a certain degree of quality
                                                               ment fusion (VMAF) approach combining several other
          of service (QoS) or, as pursued more recently, quality   measures using a support vector machine [5]. Further

          of experience (QoE) [1]. With regard to the video signal   VQA metrics worth noting are [6]–[9], which account
          part, such monitoring is realized by way of automated   for frequency dependence in the human visual system.
          video quality assessment (VQA) algorithms which ana-
          lyze each distributed moving-picture sequence frame-  Although VMAF was found to be a feasible tool for the

          by-frame with the objective of providing a frame-wise   evaluation of video coding technology [10],[11],its use
          or scene-wise estimate of the subjective visual quality   for direct encoder control is challenging since it is not

          of the tested video, as it would be perceived by a group   differentiable [12]. Furthermore, VMAF currently does

          of human observers. Full-reference VQA methods are   not allow for local quality prediction below frame level
          generally employed, which means that the distributed   and, owing to its reliance on several other VQA calcula-

          video—here, the coded and decoded signal—is evalua-  tions, is quite complex computationally. The aspect of
          ted in relation to the spatio-temporally synchronized,   relatively high complexity is shared by the approach of

          uncoded reference video. In other words, the reference   [6]–[8], utilizing block-wise 8×8 DCTs. However, low-

          video represents the input sequence to the video enco-  complexity reliable metrics which avoid the use of DCT
          der while the distributed video is the output sequence   or multiscale algorithms and which can easily be integ-
          of the video decoder, as illustrated in Fig.1.       rated into video encoders as control model for visually

                                                               optimized bit allocation purposes, as is the case with
                  Distributor Side        Consumer Side        PSNR and SSIM based approaches, are highly desirable.
          in                                           out
               Video                           Video
              Encoder   Video                 Decoder          1.1  Prior work by the authors
                      Decoder     VQA
                                 System
                                                               In JVET-H0047 [13] the authors proposed a block-wise
          Fig. 1 – Location of automatic VQA on the video distribution side.   perceptually weighted distortion model as an improve-




                                             © International Telecommunication Union, 2020                   65
   82   83   84   85   86   87   88   89   90   91   92