Page 26 - Kaleidoscope Academic Conference Proceedings 2022
P. 26

data that need to be transmitted in real time (Figure 4):  elements is trivial, since the 3D geometry is shared by all
                                                              users.
             • Free-Viewpoint Video (FVV) avatars [24]. An FVV  However, when it comes to mixing captured sources (video
               system generates synthetic views of a 3D scene from a  streams), the problem is much more complex, since there does
               virtual viewpoint chosen by the user by combining the  not have to be a correspondence between the physical spaces
               video information from several real reference cameras,  that surround the different people. In this sense, the solutions
               which capture the scene synchronously from different  proposed in the state of the art are always partial: represent
               viewpoints. In this solution, the cameras transmit color  the users separated by a window [30] or use digital twins of the
               and depth information to the remote end, where the  physical environment of the remote user in the environment
               projection of the avatar in the virtual scene from the  of the local user [31]. Further research is needed to be able to
               user’s point of view is generated in real time. This  create a distributed reality scene with enough QoE to actually
               simplifies image capture, but requires high bandwidth  provide the user with the sense of teleportation needed to
               and processing power at the remote end. This approach  fulfill the vision of the realverse.
               has been recently standardized as MPEG Immersive
               Video (MIV) in ISO/IEC 23090-12 [25].              3.  IMPLEMENTATION AND QUALITY OF
                                                                                 EXPERIENCE
             • Point cloud avatars [26]. A point cloud is a set of
               individual 3D points, each of which, in addition to  The implementation of a distributed reality solution requires,
               having a 3D position, may also contain a number of other  as has been seen, capturing, transmitting, processing and
               attributes, particularly color. In capture, the color and  composing a set of 2D and 3D video streams in real
               depth images obtained by the cameras are used to form a  time, and rendering the result on an HMD. Therefore,
               point cloud, which is compressed and transmitted using  it is necessary to have an adequate communications and
               Point Cloud Compression (PCC) technology such as  processing infrastructure for this. In addition, DR is presented
               MPEG-I V-PCC (ISO/IEC 23090-5) [27]. On reception  as an intensive and demanding technology in terms of
               the point cloud is decoded and rendered within the  bandwidth, latency and processing resource requirements.
               immersive scene.                               Therefore, it is necessary to know adequately the trade-offs
                                                              between the use of these resources and the obtained QoE.
             • Digital person model avatars [28]. In this case, a very
               detailed graphical representation of the captured person
               is generated in the capture, based on some model such as  3.1  The realverse over the 5G Network
               SMPL [29]. This description is transmitted and decoded  Figure 6 represents our reference architecture to deploy the
               at the remote end, where a previously modeled user  realverse over a 5G network, and its relationship to the
               avatar is animated. This option is the one that requires  building blocks described in the previous section. To be
               less transmission bandwidth and reception process.  mobile, the HMD needs to connect wirelessly to the network,
                                                              integrating the 5G User Equipment (UE) function.  We
           2.4 Meet: shared immersion                         assume, in principle, that the composition and rendering
                                                              of the final image is done on the device itself, since the
           The last step to compose a complete distributed reality  latency requirements of a remote rendering at 90 frames per
           experience is to integrate the different components in the  second are challenging to achieve, in a scalable manner, for
           scene so that the result is consistent, and the different users  any mobile network in the short term. However, part of the
           have the sensation of sharing the same space (Figure 5) .  processing associated to the rendering of the experience, such
                                                              as the segmentation of the egocentric image or the detection
                                                              of objects on it, can be derived to a system at the edge of the
                                                              network (Multi-access Edge Computing, MEC).

                                                              Local Rendering  Edge Cloud Processing  Remote Video
            Remote (360 )  Volumetric capture   Egocentric   Mixing and
                    0
              capture   / Remote avatars  capture   Rendering

           Figure 5 – Representation of the different video flows (visit,           MEC
           face, move) that need to be merged to create a consistent  FilkoRE RAN Emulator
           immersive shared experience (meet).
                                                                 UE       gNB  5GC/UPF
           From an implementation point of view, the solution is known.             5G Island
             5
               © 2022 Nokia
           It is enough to simply compose the virtual scene from its  Figure 6 – Reference archiecture to implement the realverse
           different elements (the remote capture, the egocentric capture  over a 5G/B5G network.
           and the avatars of the remote users) and render it on the HMD  6  © 2022 Nokia
           using a graphics engine such as Unity3D. In fact, when the  Streams representing remote users and locations, on the other
           shared scene is purely virtual, the insertion of the different  hand, must be transmitted in real time from the location




                                                          – xxii –
   21   22   23   24   25   26   27   28   29   30   31