Page 128 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 128

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020





          To compute Equation (7), ITU-R BS.1770 short-term loud-  (β = [−0.262 0.597 0.980]) were estimated from 523 ob-
          ness values, taken with a three-second integration window  servations. Predictions for the 22-channel source directions
          with one-second overlaps, were measured from the binau-  are listed in Table 6. This time a small zero correction was
          ral signals recorded by the HATS at the calibration stage.  made and no normalization was needed. Also, all gains are
          As for Equation (5), signals were integrated in their first  within ±1.5 dB and elevation effects on the estimated gains
          80 ms, when binaural sound pressures were produced by  are now more clearly defined.
          direct and early reflected sounds. Scatterplots of correla-
          tions between averaged DLS scores per direction and com-
                                                              Table 6 – Directional weights estimated by solving a linear re-
          puted localization cues are shown in Fig. 4. Correlations
                                                              gression problem
          are strong enough to proceed with these metrics as predic-
          tors of a regression model, although 3 dB and 6 dB sum-  Azimuths θ ( )  Elevations φ ( )  Gain~g (dB)
                                                                             ◦
                                                                                           ◦
          mations would result in redundant predictors due to their    −45           −30         +0.26
          almost identical scattering pattern in Fig. 4. The choice     0            −30         −0.68
          here was to consider only one binaural summation predic-     +45           −30         +0.26
          tor with the same overall gain estimated in Section 3.1.     −135           0          +0.12
                                                                       −90            0          +1.28
                                                                       −60            0          +1.08
                                                                       −30            0          +0.60
                                   1
               1                                                        0             0          0.00
                                   0                                   +30            0          +0.60
               0
                                                                       +60            0          +1.08
                                   -1
              -1                                                       +90            0          +1.28
                                                                       +135           0          +0.12
              -2                   -2
                                                                       +180           0          −0.31
                                                                       −135          +30         −0.09
              -3                   -3
               -1    0    1    2    -1   0    1    2
                                                                       −90           +30         +0.88
                                                                       −45           +30         +1.12
               3                  0.6                                   0            +30         +0.44
                                                                       +45           +30         +1.12
                                  0.5
               2
                                                                       +90           +30         +0.88
                                  0.4
               1                                                       +135          +30         −0.09
                                  0.3
                                                                       +180          +30         −0.26
               0
                                  0.2
                                                                        0            +90         −0.62
              -1
                                  0.1
                                                              Weightings for sound source directions not included in the
              -2                   0
               -1    0    1    2    -1   0    1    2
                                                              22.2 reproduction layout were estimated by smoothing the
                                                              response data using local regression. ITU-R BS.2051 la-
                                                                                  ◦
                                                                                        ◦
                                                              bels M±110 (θ = ±110 , φ = 0 ) for 5.1 and 9.1 systems
          Fig. 4 – Scatterplots of correlations between DLS and localization     ◦      ◦
                                                              and U±110 (θ = ±110 , φ = 30 ) for 9.1 systems yielded
          cues.
                                                              gains of 0.66 dB and 0.47 dB, respectively.
          A series of known regression model types, with and with-
          out Principal Component Analysis (PCA) preprocessing,  A modified version of ITU-R BS.1770 loudness with this
          were trained in a k-folds cross-validation scheme. Data  set of weights is compared with the algorithm of ref-
          was partitioned into k = 5 disjoint set of folds. For each  erence by taking the differences between their measure-
          fold, out-of-fold observations were used for training and  ments of the presented stimuli in Section 2.3, and plot-
          in-fold observations for validation. Root Mean Square Er-  ting them against means and confidence intervals of par-
          ror (RMSE), the Euclidean distance between a set of pre-  ticipants. This is done in Fig. 5. Blue squares refer to dif-
          dictions and the actual observations, was computed over all  ferences in Loudness Units (LU) between measurements
          folds then averaged. Plain linear regression resulted in the  taken with the modified and the original algorithms, and
          smallest error (RMSE = 1.7872).                     jumps in the dashed line refer to ITU-R BS.1770 +1.5 dB
                                                              gains in lateral incidences.
          For each i-th direction, the resulting model can be written
          in the form:                                        Differences between algorithms are more pronounced with
                                                              sound sources on the upper plane, where measurements
                   y i = α +β 1 x 1,i +β 2 x 2,i +β 3 x 3,i +ε i  (8)
                                                              with the directional weights listed in Table 6 fall into sub-
          where y i are the predictions of the response variable, α  jects’ confidence intervals in 8 out of 9 directions, against
          is the intercept term, x 1,i is the binaural summation for-  3 out of 9 directions with the weights in Table 1. On the
          mula predictor, x 2,i is the binaural inhibition model pre-  other hand, the modified algorithm performed worse than
          dictor, x 3,i is the spatial impression predictor, and ε i is  the original algorithm with sound sources in median sagit-
          the model residual. Intercept (α = −0.302) and betas  tal plane, where predictors related to localization cues were


           106                               © International Telecommunication Union, 2020
   123   124   125   126   127   128   129   130   131   132   133