Page 28 - Kaleidoscope Academic Conference Proceedings 2022
P. 28

o
                                                                         n (1 + x ) 6 − 3 1 + (x/3) 6   1 6  + 2 ,  if x ≥ 0
            Resource Restrictions  Delay  End-to-end   Fusion  R  MOS  Q  I D =  1,                    otherwise
                              Video Coding
                      Throughput
                                                                                1
                                                                   (
                                                                       1
                                                                              6
                                                                    1 −
              Bandwidth
                               Quality
                                       I V
                                                                       2
                                                    Map
                                                    to
                                                                                                            (2)
               Energy   RTT    Latency  I D                    x = log     T                                (3)
                                                                     2
                                                                       T m
                                                              Where T is the “interaction lag" (the application-level
           Figure 7 – Simplified QoE model for the realverse.  end-to-end delay), and T m is a the only model parameter,
           Restrictions on bandwidth, delay and energy would affect  which can be considered as the threshold where latency
           visual quality and latency. Statistic quality models for them  starts to be noticeable. A property of this function is that
                                 Nokia Confidential
           should be provided and fused into a global QoE model.  I D (T = 4T m ) = 0.5. Other models may have a second
               © 2022 Nokia
             7
           In conventional 2D video, this relationship is normally  parameter controlling the decay.
           modeled as exponential, following the IQX hypothesis [40]:  The model to be applied will strongly depend on the type of
                                                              flow where the delay happens. For voice conversation T m is
                                                              established around 100 ms, and it can be even higher if there
                                                              is visual feedback from the other side. For the performance
                                        B
                             I V = 1 − e −K 0 P          (1)  of adaptive compression schemes, users tend to be more
                                                              tolerable, and the value can be increased to 200-500 ms [46].
                                                              Tasks which require quick interactivity and response, such as
           where K 0 represents the compression efficiency, B is the  driving, may use lower values (e.g. 30 ms [36]), even though
           bit rate of the coded stream, and P is the number of pixels  trained operators can quickly adapt to perform under much
           per second. This K 0 value captures the dependency on the  worse conditions [47].
           codec efficiency (including energy considerations) and the  In general, we can say that the detailed understanding of the
           spatio-temporal content complexity [41].           effect of latency in QoE has still room for research before
           One important property is that, if we use 360-degree video in  good parametric models are developed that can safely apply
           equirectangular projection, the video can be analyzed using  to a large range of XR communication scenarios.
           the same tools as 2D video [42]. Since, in most immersive
           communication scenarios, the camera is typically set on a  4.  OPPORTUNITIES FOR STANDARDIZATION
           fixed position, and the scene does not usually have intense  The development of the realverse and, in general, of
           motion, the spatio-temporal complexity of the resulting  communications based on extended reality, is still in its
           content is moderately low, resulting also in relatively low  infancy. In fact, although existing technology has already
           bit rates, less than 10 Mbps for a 4K video in equirectangular  shown that it is possible to create communication experiences
           projection [43].                                   between people in immersive environments, there is still some
           Pointcloud transmission scenarios are less mature, and  way to go before its application is massive. In this scenario,
           therefore the ranges of total number of points in the  several opportunities open up to standardize key elements of
           representation, its refresh rate (and therefore the total P),  the development of the metaverse in any of its interpretations.
           as well as the compression efficiency for realistic DR  We will now talk about two of them.
           scenarios, are still under research. Common ranges for typical  On the one hand, it would be desirable to address the
           representations of 800 thousand to one million points range  interoperability of the various systems from the outset, so
           from 5 to 100 Mbps [27]. Bit rate can also be reduced using  that it would be possible to build communication solutions
           adapting schemes, where the part of the scene where the  that were not fully captured by (and vertically integrated
           remote user is looking at is transmitted with higher bit rate  into) a particular platform. Since text messaging and video
           than the rest of the scene.                        calling services are mainly provided by major social media
                                                              platforms and hyperscalers, a new opportunity arises for the
                                                              new generation of communications to be installed again on
           3.4  Latency requirements
                                                              open and interoperable systems, as happened with telephony.
                                                              In this context, our realverse proposal is particularly relevant,
           The effect of end-to-end delay in QoE has been modeled for  as it focuses more on peer-to-peer communications than on
           conversational and interactive applications, and it is typically  integrating users into a specific platform (or metaverse).
           characterized by a function with three steps: a first threshold  Of course, the need to implement part of the processing
           where delay is not important, a fast and linear decay, and  capacity at the edge means that it is not only necessary
           a longer tail. The mathematical form of such function may  to standardize communication protocols, but also Virtual
           be piecewise linear [44], logistic [45], log-logistic [37], or  Network Functions (VNFs) capable of offering specific
           algebraic [36].                                    services.
           For illustration purposes, we select the latter, which comes  On the other hand, there is a need to develop new standards
           from Recommendation ITU-T G.107:                   for evaluating and monitoring the quality of experience. To




                                                          – xxiv –
   23   24   25   26   27   28   29   30   31   32   33