Page 101 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 101
ITU Journal: ICT Discoveries, Vol. 3(1), June 2020
The meaning of “group of frames” (GOF) in AV1 can This formula is commonly used as a global
be compared to “group of pictures” in the other performance metric by video compression experts
coding solutions. The GOF was fixed to 16, like in (see [3], [12], [14], [17]).
JVET CTC. The verification was done by Additional results are provided with PSNRY and two
instrumenting the libaom decoder to output the other well-known objective metrics with full
quantization parameter per picture. A similar type reference (source pictures) which include some
of quantization parameter offset appears per type of subjective factors:
picture in a hierarchical GOF structure of 16 in
libaom, like in the JVET CTC configurations. The • Multi-Scale Structural Similarity (MS-SSIM)
results of this instrumentation are reported in [10]. [15],
In order to compute the gain (BD-rate [11]) of • Video Multi-method Assessment Fusion
libaom with regards to the HM, the “cq-level” was (VMAF) [16].
fixed at 32, 40, 48 and 56, providing the range of bit As these two metrics are only using the luma
rates to compute VTM and ETM gains. component, the comparison has to be made with
4.2.3 Encoding parameters PSNRY. For each scenario the results are given with
four tables corresponding to four objective metrics:
The difference between the two test scenarios lies PSNRYUV, PSNRY, VMAF, and MS-SSIM. The tables
on parameters kf-min-dist and kf-dist-max which below show the performance according to the
are the minimum and maximum distances between “Bjøntegaard Delta-Rate” (BD-rate) metric (see [11]
key frames (intra frames). For the Broadcast and [12]) in percentages with regard to the HM.
scenario both values are set equal to the number of It measures the bit-rate reduction provided by each
frames in an integer number of GOPs of length solution at the same quality, here on four measures:
below 1.1 second, while for the Streaming scenario, PSNRYUV, PSNRY, VMAF and MS-SSIM. The rate
both values are set equal to the number of frames in change is computed as the average percentage
an integer number of GOPs of length below difference in rate over a range of Quantization
2.2 seconds. Parameters (QP). A negative percentage represents
The libaom software has a parameter to set the a gain relative to the HM. The results are given as
encoding speed, which is the inverse of the encoding the average value on all the sequences but also split
algorithm quality. This parameter called “cpu-used” per picture size: UHD, HD, WVGA, WQVGA. The
is fixed to 0 meaning the best encoding quality but difference of the overall measure of ETM and libaom
also the slowest encoding time. with VTM is the last line of each table.
5.1.1 Broadcast
5. RESULTS
Table 1 reports the PSNRYUV BD-rate variations,
5.1 Objective quality compared to the HM of the three tested solutions.
It is observed that the VTM outperforms the other
In the following tables the results are provided video coding solutions, achieving nearly 42% gain
considering four different objective metrics.
over the HM in the UHD format. The performance of
PSNR (Peak Signal to Noise Ratio) is calculated as: all three solutions increases as the picture size
increases. Over all picture resolutions, the ETM
PSNR = 10log10(Max / MSE) (1)
2
performed roughly 14.4% behind the VTM, and the
with MSE being the Mean Square Error between the libaom reference encoder 21.3% behind.
source and the decoded pictures, and Max the peak
sample value of the content. PSNR is computed The performance of libaom can be discussed on the
separately for the three components of each picture, software maturity level. Clearly the libaom
then averaged across all pictures of a sequence reference encoder, with two-pass encoding and
(PSNRY, PSNRU, PSNRV). encoding algorithms improvements brought in
since 2018, is the most mature encoding solution. It
In order to get an easier interpretation of the has been observed that the highest quality
measured performance, a weighted sum of the configuration (cpu-used=0) had been accelerated
PSNR on the three components Y (luma), U and V significantly in encoding runtime from version to
(chroma), for the complete sequence, is used as the version. In 2017 [14] libaom was tested to lag
first objective metric: significantly behind HM in its best two-passes
PSNRYUV = (6×PSNRY + PSNRU + PSNRV) / 8 (2) configuration by −9.5%. The AV1 specification was
© International Telecommunication Union, 2020 79