Page 26 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 26

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020



          The MIV [4] encoder takes texture and depth videos   are generated. The MIV decoder does not specify the
          from  multiple  source  views,  each  at  a  particular   reference renderer but supplies it with the required
          position  and  orientation,  as  an  input,  optimizes   metadata and decoded streams.
          them by identifying a few as basic views, and prunes   The intended output of the reference renderer is a
          the non-basic ones by projecting them one by one     perspective viewport of the texture, selected based
          against the basic views (and the previously pruned   upon a viewer’s position and orientation, generated
          views)  to  extract  the  non-redundant  occluded    using the outputs of the immersive media decoder.
          regions.
                                                               The MIV decoding process is illustrated in Fig. 7.
          The  aggregator  then  accumulates  the  pruning        Decoding
          results over an intra-period (i.e., preset collection of   Layer
          frames)  to  account  for  motion  (which  helps  in                          Controller  Viewing Position &
          efficiently encoding the content). By the end of the      Texture   Texture   Encoder  Texture   Encoder  Decoder  Texture  depth/occupancy handling)  Orientation
                                                                                       (incl. Viewing Space &
          intra-period, a clustering is applied to extract  the
          rectangular patches, which in turn are packed into                    Depth            Viewport
          atlases   (composed   of   texture   and   depth             Depth
          components)  with  content  updated  per  frame           Depth  Depth  Depth  Encoder  Encoder  Decoder  Black to Patch  Map Decoder  Patch ID  Synthesizer  Inpainter
                                                                                 Map
          across the processed intra-period.
          The occupancy maps (indicating the valid regions
          within the patches packed in atlases) are embedded                Metadata
          within the lower range of the depth component of      Bitstream     Parser
          the atlases during the depth occupancy coding stage               Fig. 7 – MIV decoding process
          rather  than  signaling  them  separately  as  in  the
          V-PCC  case.  The  atlases  are  finally  encoded  using   4.   DELIVERING OBJECT-BASED
          the existing HEVC video codec.
                                                                     IMMERSIVE MEDIA EXPERIENCE
          The associated camera parameters list (illustrating
          how views are placed and oriented in space) and      The  implementation  of  object  indexing  input
                                                               provides  a  solution  for  delivering  object-based
          atlas  parameters  list  (indicating  how  patches  are   features for immersive media experiences.
          mapped  between  the  atlases  and  the  views)  are
          carried as metadata within the bitstream.            4.1  Objects indexing input

          The  encoding  process  of  MIV  compression  is     The  object  based  coding  solution  requires  the
          summarized in Fig. 6.                                ability  to  relate  points  and  pixels  in  the  scene  to
                                                               their objects. For point-cloud representation [5], we
                                                 Encoding      annotate each input point with an object ID, as part
                          Atlas Constructor       Layer
                                                               of point-cloud object attributes, shown in Fig. 8. The
                                                   Texture   Encoder  object ID is set to uniquely identify per point-cloud
                                     Texture
                              Pruner               Texture   Texture   Encoder  Encoder  object in a scene within a finite time period.
                      Texture
           Source Views   Source Views   (Texture + Depth) Source Views  (Texture + Depth) Source Views  (Texture + Depth)  (Texture + Depth)  View Optimizer  Is Basic?  Aggregator  Depth  Depth   Occupancy   Coder  Depth  Depth  Depth  Encoder  Encoder  Encoder
                      Depth

                              Patch
                      IV SEQ
                      PARAM   Packer
                              Atlas
                Source Camera   Generator   Metadata   Composer
                Parameter List                      Bitstream

                      Fig. 6 – MIV encoding process
          At the decoding stage, video decoding is applied to
          retrieve the atlases, the metadata is parsed, and the
          block to patch maps (also known as patch ID maps
          and  of the same size as  the atlases  indicating the
          patch  ID  the  associated  pixel  within  the  atlas    Fig. 8 – Object IDs annotation for point-cloud objects
          belongs to which helps resolve overlapped patches)




          4                                     © International Telecommunication Union, 2020
   21   22   23   24   25   26   27   28   29   30   31