Page 27 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 27
ITU Journal: ICT Discoveries, Vol. 3(1), June 2020
For immersive multi-view videos [6], pixels from Object-based Encoding
different views that belong to the same object are Atlas Constructor Layer
Obj. 0
Obj. K-1
assigned the exact object ID in a form of maps. Pruner Pruner Texture Texture Encoder
Source Texture Texture Encoder Encoder
Camera maxObjects = K
Object maps are of the same resolution as the texture Parameter
and depth maps but their bit depth depends on the List Aggreg Aggreg
ator
ator
number of objects that require indexing in the scene. Source Views Texture + Depth + View Patch Packer Depth Depth Occupancy Coder Depth Depth Depth Encoder Encoder Encoder
Fig. 9 shows the components of immersive content Object Map Optimizer
Views
made available at the MIV encoder input. (T+D+O) Atlas Generator
Object
Separator Geometry & Texture Image Padding Depth Encoder
Views Occupancy Depth Maps Encoder Encoder
(T+D+O)
Segmentation &
Orthogonal
Projector
Point Metadata Composer
Cloud
Object-based
Geometry, Texture Occupancy Maps Generator Bitstream
and Patch Generator
Each point: X, Y, Z, R, G, B, O
Fig. 10 – Object-based MIV & V-PCC encoding process
Fig. 9 – Immersive data composed of texture views, depth
maps, and object maps (showing 3 views for simplicity) A summary of the object-based encoding process
for MIV and V-PCC is illustrated in Fig. 10.
Object IDs can be generated by using machine
learning or a conventional classifier, or a Object separators are used to turn the views
segmentation algorithm running across all points in (texture and geometry/depth) into multiple layers
the point cloud or across all views in the immersive based on the associated object maps where each
content to identify different objects and assign the layer only has regions belonging to a single object.
exact object ID to various points belonging to the Then the geometry-texture-patch generation in
same object. V-PCC (explained in section 3.1) and pruner-
Alternatively, objects can be captured separately aggregator-clustering in MIV (explained in
and then populated in the same scene making it section 3.2) are applied on the layers belonging to
simple to tag the points or pixels of each object with one object at a time.
the related object ID. This results in patches where each patch has
content from a single object; although, they may be
4.2 Implementation in immersive standards
packed together in the atlases/canvases and
With object maps and object attributes being encoded as previously done. Note that in case of
available at the input, the object based encoder aims limited bandwidth or a need to highlight certain
to extract patches where each includes content from regions of action, the encoder may choose to drop
a single object. Thus the patches can be tagged by (or blur) patches of non-related objects from the
the associated object ID whether added as part of atlases or dedicate higher resolution for patches of
the patch metadata or sent within a supplemental objects of interest. This is only made feasible by
enhanced information (SEI) message. adopting the object-based immersive coding
solution.
In the V-PCC case [5], the point cloud is segmented
and projected (with all its attributes including the The decoding process of MIV and V-PCC remains the
object ID) onto the surrounding cube faces forming same. The only difference is that the renderer now
geometry and texture views along with the object can make use of the object ID per patch to render
maps. For the MIV case [6], the view optimizer only the objects of interest or replace others by a
labels the source views (and possibly novel views) synthetic content enabling innovative use cases.
as basic or not, and the object maps are carried
through.
© International Telecommunication Union, 2020 5