Page 112 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 112
ITU Journal: ICT Discoveries, Vol. 3(1), June 2020
4.2 JPEG Pleno Part 2: Light field coding the , , and -axes, where and represent the coor-
dinates of the addressed view, and and represent the
An effective plenoptic modality is given by the light fields,
sample (spatial) coordinates within the images (views).
that define light rays in space by their (constant) inten-
sity and their intersection with two planes. This is equiv- The encoder block diagram of the 4D-Transform Mode
alent to representing the plenoptic function as a 2D array (4DTM), introduced in Section 4.2 is pictured in Fig. 5.
of 2D views. In JPEG Pleno light field coding two coding
modes are defined: one exploiting the redundancy using The partitioning of the 4D blocks into sub-blocks is sig-
a 4D prediction process, the other exploiting the redun- nalled with a binary tree using ternary flags indicating
whether a block is transformed as is, is split into 4 blocks
dancy in 4D light field data by utilizing a 4D transform
in the , (view) dimensions or is split into 4 blocks in the
technique [13]. It is important to note that the two coding
, (spatial) dimensions. Next, a separable 4D-DCT is ap-
modes are independent. The light fields are input to the
plied.
JPEG Pleno codec as 2D arrays of RGB 2D views [14].
The optimized partitioning for each transformed block
The 4D Transform Mode 4D Transform Mode (4DTM) ex-
may be calculated by obtaining, for example, the La-
ploits the 4D redundancy of a light field by first partition-
grangian encoding cost , defined as = + , where
ing it into variable-size 4D blocks. Then each block is
is the distortion incurred when representing the origi-
transformed using a 4D-DCT. The bit planes of the gen-
erated 4D array of transform coefficients are first par- nal block by its quantized version and is the necessary
rate to encode it. The other possible R-D costs are calcu-
titioned and encoded using hexadeca-trees followed by
lated whenever a 4D block is partitioned in its spatial or
an adaptive arithmetic encoder. The partition process
view dimensions. For example, the left-hand side of Fig. 6
and the bit planes clustering can be jointly determined
pictures a × × × 4D-block subdivided into four sub-
by a Rate-Distortion (R-D) Lagrangian optimization pro-
blocks of sizes × ×⌊ ⌋×⌊ ⌋, × ×⌊ ⌋×(⌊ − ⌋),
cedure although this an encoder issue and hence not pre- 2 2 2
2
scribed by the standard. The 4DTM mode also provides × ×(⌊ − ⌋)×⌊ ⌋ and × ×(⌊ − ⌋)×(⌊ − ⌋)
2 2 2 2
random access capabilities. respectively. The optimized partitioning for each sub-
block is computed by a recursive procedure and the La-
In the 4D Prediction Mode (4DPM) a subset of views is grangian costs of the four sub-blocks are added to com-
selected as reference views while the rest of the views pute the spatial R-D cost . The block can be further par-
are referred to as intermediate views. The texture and titioned in the view directions, with sub-blocks of sizes
depth of the reference views are encoded using the JPEG- definedintheright-handsideofFig.6. Theoptimizedpar-
2000 standard. The pixel correspondence information titioning for each sub-block is computed using a recursive
between the reference views and an intermediate view is procedure and the Lagrangian costs of the four sub-blocks
obtained from the depth maps and camera parameters. are added to compute the view R-D cost . One should
The pixels of each reference view are warped to the in- note that if the recursive procedure is expanded to trans-
termediate view location followed by the prediction stage form into a non-recursive one, it would be equivalent to a
where the multiple warped views are merged into a com- bottom-up optimization of the tree.
plete view using least-squares sense optimal predictors
over a set of occlusion-based regions. Being depth-based, Fig. 7 shows the hierarchical recursive partitioning. The
the 4DPM can efficiently encode light fields obtained with algorithm keeps track of this tree, returning a partition-
a variety of light field imaging technologies such as those String flag that represents the optimized tree. When the
obtained with micro-lens based plenoptic cameras and lowest cost is chosen, the current value of partitionString
camera arrays. is augmented by appending to it the flag corresponding to
the lowest cost chosen (Fig. 7: transform Flag, spatialSplit
The 4DPM can encode light fields very efficiently when re- Flag or viewSplit Flag). The string returned by the recur-
liable depth information is available. On the other hand, sive call that leads to the minimum cost is also appended
the 4DTM does not need depth information for encod- to the end of the partitionString, returning both the mini-
ing, but is efficient only for encoding light fields with very mum cost , or and the updated partitionString.
high angular view density, such as the ones acquired by
plenoptic cameras. More details can be found in sec-
tions 4.2.2 and 4.2.1 of this paper.
4.2.1 4D-Transform Mode (4DTM)
The parameterization ( , , , ) is a 4D simplification of
the plenoptic function that considers the intensity of each
light ray constant along its path. Using the two-plane pa-
rameterization of light fields [15], a sample (pixel) of the
light field is referenced in a 4D coordinate system along
90 © International Telecommunication Union, 2020