Page 101 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 101

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020



          The meaning of “group of frames” (GOF) in AV1 can    This  formula  is  commonly  used  as  a  global
          be  compared  to  “group  of  pictures”  in  the  other   performance metric by video compression experts
          coding solutions. The GOF was fixed to 16, like in   (see [3], [12], [14], [17]).
          JVET  CTC.  The  verification  was  done  by         Additional results are provided with PSNRY and two
          instrumenting  the  libaom  decoder  to  output  the   other  well-known  objective  metrics  with  full
          quantization parameter per picture. A similar type   reference  (source  pictures)  which  include  some
          of quantization parameter offset appears per type of   subjective factors:
          picture  in  a  hierarchical  GOF  structure  of  16  in
          libaom,  like  in  the  JVET  CTC  configurations.  The   •   Multi-Scale Structural Similarity (MS-SSIM)
          results of this instrumentation are reported in [10].      [15],

          In  order  to  compute  the  gain  (BD-rate  [11])  of   •   Video Multi-method Assessment Fusion
          libaom with regards to the HM, the “cq-level” was          (VMAF) [16].
          fixed at 32, 40, 48 and 56, providing the range of bit   As  these  two  metrics  are  only  using  the  luma
          rates to compute VTM and ETM gains.                  component,  the  comparison  has  to  be  made  with
          4.2.3 Encoding parameters                            PSNRY. For each scenario the results are given with
                                                               four tables corresponding to four objective metrics:
          The difference between the two test scenarios lies   PSNRYUV,    PSNRY,  VMAF,  and  MS-SSIM.  The  tables
          on  parameters  kf-min-dist  and  kf-dist-max  which   below  show  the  performance  according  to  the
          are the minimum and maximum distances between        “Bjøntegaard Delta-Rate” (BD-rate) metric (see [11]
          key  frames  (intra  frames).  For  the  Broadcast   and  [12])  in  percentages  with  regard  to  the  HM.
          scenario both values are set equal to the number of   It measures the bit-rate reduction provided by each
          frames  in  an  integer  number  of  GOPs  of  length   solution at the same quality, here on four measures:
          below 1.1 second, while for the Streaming scenario,   PSNRYUV,  PSNRY,  VMAF  and  MS-SSIM.  The  rate
          both values are set equal to the number of frames in   change  is  computed  as  the  average  percentage
          an  integer  number  of  GOPs  of  length  below     difference  in  rate  over  a  range  of  Quantization
          2.2 seconds.                                         Parameters (QP). A negative percentage represents
          The  libaom  software  has  a  parameter  to  set  the   a gain relative to the HM. The results are given as
          encoding speed, which is the inverse of the encoding   the average value on all the sequences but also split
          algorithm quality. This parameter called “cpu-used”   per  picture  size:  UHD,  HD,  WVGA,  WQVGA.  The
          is fixed to 0 meaning the best encoding quality but   difference of the overall measure of ETM and libaom
          also the slowest encoding time.                      with VTM is the last line of each table.
                                                               5.1.1 Broadcast
          5.   RESULTS
                                                               Table  1  reports  the  PSNRYUV  BD-rate  variations,
          5.1  Objective quality                               compared to the HM of the three tested solutions.
                                                               It is observed that the VTM outperforms the other
          In  the  following  tables  the  results  are  provided   video coding solutions, achieving nearly 42% gain
          considering four different objective metrics.
                                                               over the HM in the UHD format. The performance of
          PSNR (Peak Signal to Noise Ratio) is calculated as:   all  three  solutions  increases  as  the  picture  size
                                                               increases.  Over  all  picture  resolutions,  the  ETM
               PSNR = 10log10(Max  / MSE)            (1)
                                   2
                                                               performed roughly 14.4% behind the VTM, and the
          with MSE being the Mean Square Error between the     libaom reference encoder 21.3% behind.
          source and the decoded pictures, and Max the peak
          sample  value  of  the  content.  PSNR  is  computed   The performance of libaom can be discussed on the
          separately for the three components of each picture,   software  maturity  level.  Clearly  the  libaom
          then  averaged  across  all  pictures  of  a  sequence   reference  encoder,  with  two-pass  encoding  and
          (PSNRY, PSNRU, PSNRV).                               encoding  algorithms  improvements  brought  in
                                                               since 2018, is the most mature encoding solution. It
          In  order  to  get  an  easier  interpretation  of  the   has  been  observed  that  the  highest  quality
          measured  performance,  a  weighted  sum  of  the    configuration  (cpu-used=0)  had  been  accelerated
          PSNR on the three components Y (luma), U and V       significantly  in  encoding  runtime  from  version  to
          (chroma), for the complete sequence, is used as the   version.  In  2017  [14]  libaom  was  tested  to  lag
          first objective metric:                              significantly  behind  HM  in  its  best  two-passes

           PSNRYUV = (6×PSNRY + PSNRU + PSNRV) / 8   (2)       configuration by −9.5%. The AV1 specification was



                                                © International Telecommunication Union, 2020                 79
   96   97   98   99   100   101   102   103   104   105   106