Page 27 - Kaleidoscope Academic Conference Proceedings 2022
P. 27
where they occur. Although it is not represented in the which loosely match the three main system blocks described
figure, these flows may require additional processing (for in Figure 6: local, edge, and remote.
example, creating a pointcloud representation of the avatar Recommendations such as P.1320 are relevant, but still
from the set of view+depth cameras) which could also be insufficient. The ultimate goal of QoE modeling is to
carried out at the edge of the network, close to the capture. have statistical models that allow the proper design of the
Specifically relevant for the optimization of XR comunication network. This is a technically complicated challenge, due
is the viewport-dependant processing of the remote video to the complexity and heterogeneity of the flows involved.
flows (either 360 or pointcloud) in the edge: transmitting Currently, the ITU-T has drafted Recommendations with
only the part of the scene which is actually being seen by the parametric models (opinion models) for the most common
HMD user, thus saving bandwidth on the downlink channel. telecommunications services: voice (G.107), video call
Offloading the algorithms to the edge cloud, e. g. by (G.1070), IP television (G.1071), or online gaming (G.1072).
using MEC, is necessary in order to guarantee sufficient In all cases, these are quite complex models, designed
processing capacity to execute them, for example if we are and developed with a multitude of parameters, and which,
talking about semantic segmentation neural networks [21]. however, model much simpler communications systems.
However, the requirements that this segmentation be done in Therefore, it is necessary to approach simpler models as
real time and with a sufficiently high frame rate, imply that a first approximation, which capture the main interactions
the network must support uplink traffic peaks in the order involved in the XR service, and which evaluate the order of
of gigabits per second, and have Round-Trip Times (RTTs) magnitude of the relationship between the network restriction
up to a MEC servcer of a few milliseconds [32]. When (for example, bandwidth) and the QoE. For this, we rely
testing such demanding algorithms with currently deployed on previous versions of this exercise, for use cases of
5G networks, even those with the highest capacity operating tele-operated driving [36] or virtual reality [37]. As a starting
in the millimeter band, it is frequent that the algorithms need point, ITU-T has recently published the technical report
to work on reduced frame resolution or frequency to fit within GSTR-5GQoE, which describes the most relevant factors to
the network capacity [24]. In order to study in detail the perform this analysis in several use cases involving real-time
interaction of networks with XR systems, we have developed video transmission over 5G networks [38]. The methodology
a full-stack emulator of the 5G access network (FikoRE), applied in this technical report can be used to identify the
with which we can test configurations not yet available on the most relevant QoE requirements of the service, using them to
market [33] . build a simplified parametric model.
Emulating the network with FikoRE, we have been able to Figure 7 shows a reference model for our approach. The main
test the operation of the segmentation algorithms described in restrictions to which the system is subjected are bandwidth,
the Section 2.2 for different configurations of a 5G and B5G latency and energy. This fundamentally affects two elements
networks [34]. In order to carry out the segmentation with of QoE: visual quality and end-to-end latency. The visual
sufficiently low latency and, therefore, to be able to deploy a quality is directly affected by the compression level, related to
DR service with sufficient QoE, it is necessary to use a radio the throughput available to transmit; although the processing
access network in millimeter band, with at least 400 MHz of capacity (energy) or the execution time of the algorithms
bandwidth and a symmetrical Time-Division Duplex (TDD) will also influence it. In the same way, the latency will be
configuration for uplink and downlink. determined by the round-trip time of the network, to which
are added other factors that affect the transmission, such as
the relationship between the coding bitrate and the network
3.2 Towards a quality of experience model
throughput. This analysis process will be executed in parallel
As mentioned above, XR communications systems, and for the different flows involved, giving rise to a degradation
the realverse is no exception, operate under constraints of factor I for each of them. The final process consists of
bandwidth, latency, and computing power (or, equivalently, merging the different contributions I to obtain a rating factor
energy). It is therefore necessary to know the relationship R, which can be translated into an expected value of the Mean
between these restrictions and their impact on the quality of Opinion Score (MOS) [39].
experience, in order to dimension, operate and monitor the
network effectively. 3.3 Throughput requirements
QoE assessment in XR communications is a complex
task. Recommendation ITU-T P.1320, recently published, The throughput requirement for a video stream (B), regardless
advises on aspects of importance for QoE assessment of of the format, is simply the number of pixels (or image points)
telemeetings with extended reality elements [35]. The per second that need to be transmitted (P), multiplied by the
goal is to define the human, context, and system factors average number of bits used to transmit each pixel (K). For
that affect the choice of the QoE assessment procedure a given scene, capture and compression technology, the rate
and metrics when XR communication systems are under K represents the degree of compression achieved. Since
evaluation. Among the System Influencing Factors (SIFs) most video coding techniques use lossy compression (due to
for QoE, the Recommendation addresses three categories: some quantification process), for a given B, increasing K (and
the representation of the user and the world, the effect of therefore increasing B = P × K) results in an improvement of
rendering, and the restrictions of the communication network, the perceived quality.
– xxiii –