Page 26 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 26
ITU Journal: ICT Discoveries, Vol. 3(1), June 2020
The MIV [4] encoder takes texture and depth videos are generated. The MIV decoder does not specify the
from multiple source views, each at a particular reference renderer but supplies it with the required
position and orientation, as an input, optimizes metadata and decoded streams.
them by identifying a few as basic views, and prunes The intended output of the reference renderer is a
the non-basic ones by projecting them one by one perspective viewport of the texture, selected based
against the basic views (and the previously pruned upon a viewer’s position and orientation, generated
views) to extract the non-redundant occluded using the outputs of the immersive media decoder.
regions.
The MIV decoding process is illustrated in Fig. 7.
The aggregator then accumulates the pruning Decoding
results over an intra-period (i.e., preset collection of Layer
frames) to account for motion (which helps in Controller Viewing Position &
efficiently encoding the content). By the end of the Texture Texture Encoder Texture Encoder Decoder Texture depth/occupancy handling) Orientation
(incl. Viewing Space &
intra-period, a clustering is applied to extract the
rectangular patches, which in turn are packed into Depth Viewport
atlases (composed of texture and depth Depth
components) with content updated per frame Depth Depth Depth Encoder Encoder Decoder Black to Patch Map Decoder Patch ID Synthesizer Inpainter
Map
across the processed intra-period.
The occupancy maps (indicating the valid regions
within the patches packed in atlases) are embedded Metadata
within the lower range of the depth component of Bitstream Parser
the atlases during the depth occupancy coding stage Fig. 7 – MIV decoding process
rather than signaling them separately as in the
V-PCC case. The atlases are finally encoded using 4. DELIVERING OBJECT-BASED
the existing HEVC video codec.
IMMERSIVE MEDIA EXPERIENCE
The associated camera parameters list (illustrating
how views are placed and oriented in space) and The implementation of object indexing input
provides a solution for delivering object-based
atlas parameters list (indicating how patches are features for immersive media experiences.
mapped between the atlases and the views) are
carried as metadata within the bitstream. 4.1 Objects indexing input
The encoding process of MIV compression is The object based coding solution requires the
summarized in Fig. 6. ability to relate points and pixels in the scene to
their objects. For point-cloud representation [5], we
Encoding annotate each input point with an object ID, as part
Atlas Constructor Layer
of point-cloud object attributes, shown in Fig. 8. The
Texture Encoder object ID is set to uniquely identify per point-cloud
Texture
Pruner Texture Texture Encoder Encoder object in a scene within a finite time period.
Texture
Source Views Source Views (Texture + Depth) Source Views (Texture + Depth) Source Views (Texture + Depth) (Texture + Depth) View Optimizer Is Basic? Aggregator Depth Depth Occupancy Coder Depth Depth Depth Encoder Encoder Encoder
Depth
Patch
IV SEQ
PARAM Packer
Atlas
Source Camera Generator Metadata Composer
Parameter List Bitstream
Fig. 6 – MIV encoding process
At the decoding stage, video decoding is applied to
retrieve the atlases, the metadata is parsed, and the
block to patch maps (also known as patch ID maps
and of the same size as the atlases indicating the
patch ID the associated pixel within the atlas Fig. 8 – Object IDs annotation for point-cloud objects
belongs to which helps resolve overlapped patches)
4 © International Telecommunication Union, 2020