Page 26 - Kaleidoscope Academic Conference Proceedings 2022

P. 26

data that need to be transmitted in real time (Figure 4): elements is trivial, since the 3D geometry is shared by all
users.
• Free-Viewpoint Video (FVV) avatars [24]. An FVV However, when it comes to mixing captured sources (video
system generates synthetic views of a 3D scene from a streams), the problem is much more complex, since there does
virtual viewpoint chosen by the user by combining the not have to be a correspondence between the physical spaces
video information from several real reference cameras, that surround the diﬀerent people. In this sense, the solutions
which capture the scene synchronously from diﬀerent proposed in the state of the art are always partial: represent
viewpoints. In this solution, the cameras transmit color the users separated by a window [30] or use digital twins of the
and depth information to the remote end, where the physical environment of the remote user in the environment
projection of the avatar in the virtual scene from the of the local user [31]. Further research is needed to be able to
user’s point of view is generated in real time. This create a distributed reality scene with enough QoE to actually
simpliﬁes image capture, but requires high bandwidth provide the user with the sense of teleportation needed to
and processing power at the remote end. This approach fulﬁll the vision of the realverse.
has been recently standardized as MPEG Immersive
Video (MIV) in ISO/IEC 23090-12 [25]. 3. IMPLEMENTATION AND QUALITY OF
EXPERIENCE
• Point cloud avatars [26]. A point cloud is a set of
individual 3D points, each of which, in addition to The implementation of a distributed reality solution requires,
having a 3D position, may also contain a number of other as has been seen, capturing, transmitting, processing and
attributes, particularly color. In capture, the color and composing a set of 2D and 3D video streams in real
depth images obtained by the cameras are used to form a time, and rendering the result on an HMD. Therefore,
point cloud, which is compressed and transmitted using it is necessary to have an adequate communications and
Point Cloud Compression (PCC) technology such as processing infrastructure for this. In addition, DR is presented
MPEG-I V-PCC (ISO/IEC 23090-5) [27]. On reception as an intensive and demanding technology in terms of
the point cloud is decoded and rendered within the bandwidth, latency and processing resource requirements.
immersive scene. Therefore, it is necessary to know adequately the trade-oﬀs
between the use of these resources and the obtained QoE.
• Digital person model avatars [28]. In this case, a very
detailed graphical representation of the captured person
is generated in the capture, based on some model such as 3.1 The realverse over the 5G Network
SMPL [29]. This description is transmitted and decoded Figure 6 represents our reference architecture to deploy the
at the remote end, where a previously modeled user realverse over a 5G network, and its relationship to the
avatar is animated. This option is the one that requires building blocks described in the previous section. To be
less transmission bandwidth and reception process. mobile, the HMD needs to connect wirelessly to the network,
integrating the 5G User Equipment (UE) function. We
2.4 Meet: shared immersion assume, in principle, that the composition and rendering
of the ﬁnal image is done on the device itself, since the
The last step to compose a complete distributed reality latency requirements of a remote rendering at 90 frames per
experience is to integrate the diﬀerent components in the second are challenging to achieve, in a scalable manner, for
scene so that the result is consistent, and the diﬀerent users any mobile network in the short term. However, part of the
have the sensation of sharing the same space (Figure 5) . processing associated to the rendering of the experience, such
as the segmentation of the egocentric image or the detection
of objects on it, can be derived to a system at the edge of the
network (Multi-access Edge Computing, MEC).

Local Rendering Edge Cloud Processing Remote Video
Remote (360 ) Volumetric capture Egocentric Mixing and
0
capture / Remote avatars capture Rendering

Figure 5 – Representation of the diﬀerent video ﬂows (visit, MEC
face, move) that need to be merged to create a consistent FilkoRE RAN Emulator
immersive shared experience (meet).
UE gNB 5GC/UPF
From an implementation point of view, the solution is known. 5G Island
5
© 2022 Nokia
It is enough to simply compose the virtual scene from its Figure 6 – Reference archiecture to implement the realverse
diﬀerent elements (the remote capture, the egocentric capture over a 5G/B5G network.
and the avatars of the remote users) and render it on the HMD 6 © 2022 Nokia
using a graphics engine such as Unity3D. In fact, when the Streams representing remote users and locations, on the other
shared scene is purely virtual, the insertion of the diﬀerent hand, must be transmitted in real time from the location

– xxii –

21 22 23 24 25 26 27 28 29 30 31