Page 28 - Kaleidoscope Academic Conference Proceedings 2022
P. 28
o
n (1 + x ) 6 − 3 1 + (x/3) 6 1 6 + 2 , if x ≥ 0
Resource Restrictions Delay End-to-end Fusion R MOS Q I D = 1, otherwise
Video Coding
Throughput
1
(
1
6
1 −
Bandwidth
Quality
I V
2
Map
to
(2)
Energy RTT Latency I D x = log T (3)
2
T m
Where T is the “interaction lag" (the application-level
Figure 7 – Simplified QoE model for the realverse. end-to-end delay), and T m is a the only model parameter,
Restrictions on bandwidth, delay and energy would affect which can be considered as the threshold where latency
visual quality and latency. Statistic quality models for them starts to be noticeable. A property of this function is that
Nokia Confidential
should be provided and fused into a global QoE model. I D (T = 4T m ) = 0.5. Other models may have a second
© 2022 Nokia
7
In conventional 2D video, this relationship is normally parameter controlling the decay.
modeled as exponential, following the IQX hypothesis [40]: The model to be applied will strongly depend on the type of
flow where the delay happens. For voice conversation T m is
established around 100 ms, and it can be even higher if there
is visual feedback from the other side. For the performance
B
I V = 1 − e −K 0 P (1) of adaptive compression schemes, users tend to be more
tolerable, and the value can be increased to 200-500 ms [46].
Tasks which require quick interactivity and response, such as
where K 0 represents the compression efficiency, B is the driving, may use lower values (e.g. 30 ms [36]), even though
bit rate of the coded stream, and P is the number of pixels trained operators can quickly adapt to perform under much
per second. This K 0 value captures the dependency on the worse conditions [47].
codec efficiency (including energy considerations) and the In general, we can say that the detailed understanding of the
spatio-temporal content complexity [41]. effect of latency in QoE has still room for research before
One important property is that, if we use 360-degree video in good parametric models are developed that can safely apply
equirectangular projection, the video can be analyzed using to a large range of XR communication scenarios.
the same tools as 2D video [42]. Since, in most immersive
communication scenarios, the camera is typically set on a 4. OPPORTUNITIES FOR STANDARDIZATION
fixed position, and the scene does not usually have intense The development of the realverse and, in general, of
motion, the spatio-temporal complexity of the resulting communications based on extended reality, is still in its
content is moderately low, resulting also in relatively low infancy. In fact, although existing technology has already
bit rates, less than 10 Mbps for a 4K video in equirectangular shown that it is possible to create communication experiences
projection [43]. between people in immersive environments, there is still some
Pointcloud transmission scenarios are less mature, and way to go before its application is massive. In this scenario,
therefore the ranges of total number of points in the several opportunities open up to standardize key elements of
representation, its refresh rate (and therefore the total P), the development of the metaverse in any of its interpretations.
as well as the compression efficiency for realistic DR We will now talk about two of them.
scenarios, are still under research. Common ranges for typical On the one hand, it would be desirable to address the
representations of 800 thousand to one million points range interoperability of the various systems from the outset, so
from 5 to 100 Mbps [27]. Bit rate can also be reduced using that it would be possible to build communication solutions
adapting schemes, where the part of the scene where the that were not fully captured by (and vertically integrated
remote user is looking at is transmitted with higher bit rate into) a particular platform. Since text messaging and video
than the rest of the scene. calling services are mainly provided by major social media
platforms and hyperscalers, a new opportunity arises for the
new generation of communications to be installed again on
3.4 Latency requirements
open and interoperable systems, as happened with telephony.
In this context, our realverse proposal is particularly relevant,
The effect of end-to-end delay in QoE has been modeled for as it focuses more on peer-to-peer communications than on
conversational and interactive applications, and it is typically integrating users into a specific platform (or metaverse).
characterized by a function with three steps: a first threshold Of course, the need to implement part of the processing
where delay is not important, a fast and linear decay, and capacity at the edge means that it is not only necessary
a longer tail. The mathematical form of such function may to standardize communication protocols, but also Virtual
be piecewise linear [44], logistic [45], log-logistic [37], or Network Functions (VNFs) capable of offering specific
algebraic [36]. services.
For illustration purposes, we select the latter, which comes On the other hand, there is a need to develop new standards
from Recommendation ITU-T G.107: for evaluating and monitoring the quality of experience. To
– xxiv –