Page 128 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media

P. 128

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020

To compute Equation (7), ITU-R BS.1770 short-term loud- (β = [−0.262 0.597 0.980]) were estimated from 523 ob-
ness values, taken with a three-second integration window servations. Predictions for the 22-channel source directions
with one-second overlaps, were measured from the binau- are listed in Table 6. This time a small zero correction was
ral signals recorded by the HATS at the calibration stage. made and no normalization was needed. Also, all gains are
As for Equation (5), signals were integrated in their ﬁrst within ±1.5 dB and elevation effects on the estimated gains
80 ms, when binaural sound pressures were produced by are now more clearly deﬁned.
direct and early reﬂected sounds. Scatterplots of correla-
tions between averaged DLS scores per direction and com-
Table 6 – Directional weights estimated by solving a linear re-
puted localization cues are shown in Fig. 4. Correlations
gression problem
are strong enough to proceed with these metrics as predic-
tors of a regression model, although 3 dB and 6 dB sum- Azimuths θ ( ) Elevations φ ( ) Gain~g (dB)
◦
◦
mations would result in redundant predictors due to their −45 −30 +0.26
almost identical scattering pattern in Fig. 4. The choice 0 −30 −0.68
here was to consider only one binaural summation predic- +45 −30 +0.26
tor with the same overall gain estimated in Section 3.1. −135 0 +0.12
−90 0 +1.28
−60 0 +1.08
−30 0 +0.60
1
1 0 0 0.00
0 +30 0 +0.60
0
+60 0 +1.08
-1
-1 +90 0 +1.28
+135 0 +0.12
-2 -2
+180 0 −0.31
−135 +30 −0.09
-3 -3
-1 0 1 2 -1 0 1 2
−90 +30 +0.88
−45 +30 +1.12
3 0.6 0 +30 +0.44
+45 +30 +1.12
0.5
2
+90 +30 +0.88
0.4
1 +135 +30 −0.09
0.3
+180 +30 −0.26
0
0.2
0 +90 −0.62
-1
0.1
Weightings for sound source directions not included in the
-2 0
-1 0 1 2 -1 0 1 2
22.2 reproduction layout were estimated by smoothing the
response data using local regression. ITU-R BS.2051 la-
◦
◦
bels M±110 (θ = ±110 , φ = 0 ) for 5.1 and 9.1 systems
Fig. 4 – Scatterplots of correlations between DLS and localization ◦ ◦
and U±110 (θ = ±110 , φ = 30 ) for 9.1 systems yielded
cues.
gains of 0.66 dB and 0.47 dB, respectively.
A series of known regression model types, with and with-
out Principal Component Analysis (PCA) preprocessing, A modiﬁed version of ITU-R BS.1770 loudness with this
were trained in a k-folds cross-validation scheme. Data set of weights is compared with the algorithm of ref-
was partitioned into k = 5 disjoint set of folds. For each erence by taking the differences between their measure-
fold, out-of-fold observations were used for training and ments of the presented stimuli in Section 2.3, and plot-
in-fold observations for validation. Root Mean Square Er- ting them against means and conﬁdence intervals of par-
ror (RMSE), the Euclidean distance between a set of pre- ticipants. This is done in Fig. 5. Blue squares refer to dif-
dictions and the actual observations, was computed over all ferences in Loudness Units (LU) between measurements
folds then averaged. Plain linear regression resulted in the taken with the modiﬁed and the original algorithms, and
smallest error (RMSE = 1.7872). jumps in the dashed line refer to ITU-R BS.1770 +1.5 dB
gains in lateral incidences.
For each i-th direction, the resulting model can be written
in the form: Differences between algorithms are more pronounced with
sound sources on the upper plane, where measurements
y i = α +β 1 x 1,i +β 2 x 2,i +β 3 x 3,i +ε i (8)
with the directional weights listed in Table 6 fall into sub-
where y i are the predictions of the response variable, α jects’ conﬁdence intervals in 8 out of 9 directions, against
is the intercept term, x 1,i is the binaural summation for- 3 out of 9 directions with the weights in Table 1. On the
mula predictor, x 2,i is the binaural inhibition model pre- other hand, the modiﬁed algorithm performed worse than
dictor, x 3,i is the spatial impression predictor, and ε i is the original algorithm with sound sources in median sagit-
the model residual. Intercept (α = −0.302) and betas tal plane, where predictors related to localization cues were

106 © International Telecommunication Union, 2020

123 124 125 126 127 128 129 130 131 132 133