Page 27 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 27

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020



          For  immersive  multi-view  videos  [6],  pixels  from                   Object-based          Encoding
          different  views  that  belong  to  the  same  object  are              Atlas Constructor       Layer
                                                                                   Obj. 0
                                                                                        Obj. K-1
          assigned the exact object ID in a form of maps.                          Pruner  Pruner  Texture  Texture   Encoder
                                                                           Source                          Texture   Texture   Encoder  Encoder
                                                                          Camera   maxObjects = K
          Object maps are of the same resolution as the texture           Parameter
          and  depth  maps  but  their  bit  depth  depends  on  the       List   Aggreg  Aggreg
                                                                                   ator
                                                                                         ator
          number of objects that require indexing in the scene.   Source Views   Texture + Depth +   View  Patch Packer  Depth  Depth   Occupancy   Coder  Depth  Depth  Depth  Encoder  Encoder  Encoder
          Fig.  9  shows  the  components  of  immersive  content   Object Map  Optimizer
                                                                        Views
          made available at the MIV encoder input.                      (T+D+O)    Atlas Generator
                                                                          Object
                                                                          Separator               Geometry &   Texture  Image Padding  Depth  Encoder
                                                                        Views                              Occupancy  Depth  Maps  Encoder  Encoder
                                                                        (T+D+O)
                                                                        Segmentation &
                                                                         Orthogonal
                                                                          Projector
                                                                 Point                             Metadata   Composer
                                                                 Cloud
                                                                                  Object-based
                                                                                 Geometry, Texture   Occupancy   Maps   Generator  Bitstream
                                                                                and Patch Generator


                                                                Each point: X, Y, Z, R, G, B, O

                                                                   Fig. 10 – Object-based MIV & V-PCC encoding process
           Fig. 9 – Immersive data composed of texture views, depth
            maps, and object maps (showing 3 views for simplicity)   A  summary  of  the  object-based  encoding  process
                                                               for MIV and V-PCC is illustrated in Fig. 10.
          Object  IDs  can  be  generated  by  using  machine
          learning  or  a  conventional  classifier,  or  a    Object  separators  are  used  to  turn  the  views
          segmentation algorithm running across all points in   (texture and geometry/depth) into multiple layers
          the point cloud or across all views in the immersive   based  on  the  associated  object  maps  where  each
          content to identify different objects and assign the   layer only has regions belonging to a single object.
          exact object ID to various points belonging to the   Then  the  geometry-texture-patch  generation  in
          same object.                                         V-PCC  (explained  in  section  3.1)  and  pruner-
          Alternatively,  objects  can  be  captured  separately   aggregator-clustering  in  MIV  (explained  in
          and  then  populated  in  the  same  scene  making  it   section 3.2) are applied on the layers belonging to
          simple to tag the points or pixels of each object with   one object at a time.
          the related object ID.                               This  results  in  patches  where  each  patch  has
                                                               content from a single object; although, they may be
          4.2  Implementation in immersive standards
                                                               packed  together  in  the  atlases/canvases  and
          With  object  maps  and  object  attributes  being   encoded  as  previously  done.  Note  that  in  case  of
          available at the input, the object based encoder aims   limited  bandwidth  or  a  need  to  highlight  certain
          to extract patches where each includes content from   regions of action, the encoder may choose to drop
          a single object. Thus the patches can be tagged by   (or  blur)  patches  of  non-related  objects  from  the
          the associated object ID whether added as part of    atlases or dedicate higher resolution for patches of
          the patch metadata or sent within a supplemental     objects  of  interest.  This  is  only  made  feasible  by
          enhanced information (SEI) message.                  adopting  the  object-based  immersive  coding
                                                               solution.
          In the V-PCC case [5], the point cloud is segmented
          and projected (with all its attributes including the   The decoding process of MIV and V-PCC remains the
          object ID) onto the surrounding cube faces forming   same. The only difference is that the renderer now
          geometry and texture views along with the object     can make use of the object ID per patch to render
          maps.  For  the  MIV  case  [6],  the  view  optimizer   only the objects of interest or replace others by a
          labels the source views (and possibly novel views)   synthetic content enabling innovative use cases.
          as  basic  or  not,  and  the  object  maps  are  carried
          through.






                                                © International Telecommunication Union, 2020                  5
   22   23   24   25   26   27   28   29   30   31   32