Page 112 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 112

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020




          4.2 JPEG Pleno Part 2: Light field coding            the   ,   ,    and   -axes, where    and    represent the coor-
                                                               dinates of the addressed view, and    and    represent the
          An effective plenoptic modality is given by the light fields,
                                                               sample (spatial) coordinates within the images (views).
          that define light rays in space by their (constant) inten-
          sity and their intersection with two planes. This is equiv-  The encoder block diagram of the 4D-Transform Mode
          alent to representing the plenoptic function as a 2D array  (4DTM), introduced in Section 4.2 is pictured in Fig. 5.
          of 2D views. In JPEG Pleno light field coding two coding
          modes are defined: one exploiting the redundancy using  The partitioning of the 4D blocks into sub-blocks is sig-
          a 4D prediction process, the other exploiting the redun-  nalled with a binary tree using ternary flags indicating
                                                               whether a block is transformed as is, is split into 4 blocks
          dancy in 4D light field data by utilizing a 4D transform
                                                               in the   ,    (view) dimensions or is split into 4 blocks in the
          technique [13]. It is important to note that the two coding
                                                                 ,    (spatial) dimensions. Next, a separable 4D-DCT is ap-
          modes are independent. The light fields are input to the
                                                               plied.
          JPEG Pleno codec as 2D arrays of RGB 2D views [14].
                                                               The optimized partitioning for each transformed block
          The 4D Transform Mode 4D Transform Mode (4DTM) ex-
                                                               may be calculated by obtaining, for example, the La-
          ploits the 4D redundancy of a light field by first partition-
                                                               grangian encoding cost    , defined as    =    +    , where
                                                                                                  
                                                                                      
          ing it into variable-size 4D blocks. Then each block is
                                                                  is the distortion incurred when representing the origi-
          transformed using a 4D-DCT. The bit planes of the gen-
          erated 4D array of transform coefficients are first par-  nal block by its quantized version and    is the necessary
                                                               rate to encode it. The other possible R-D costs are calcu-
          titioned and encoded using hexadeca-trees followed by
                                                               lated whenever a 4D block is partitioned in its spatial or
          an adaptive arithmetic encoder. The partition process
                                                               view dimensions. For example, the left-hand side of Fig. 6
          and the bit planes clustering can be jointly determined
                                                               pictures a    ×   ×   ×   4D-block subdivided into four sub-
                                                                                   
                                                                           
                                                                             
                                                                                
          by a Rate-Distortion (R-D) Lagrangian optimization pro-                                                
                                                               blocks of sizes    ×   ×⌊  ⌋×⌊  ⌋,    ×   ×⌊  ⌋×(⌊   −  ⌋),
                                                                                              
                                                                                                           
                                                                                                 
                                                                              
                                                                                 
          cedure although this an encoder issue and hence not pre-                      2  2                2
                                                                                                  2
                                                                                                                 
          scribed by the standard. The 4DTM mode also provides     ×   ×(⌊   −  ⌋)×⌊  ⌋ and    ×   ×(⌊   −  ⌋)×(⌊   −  ⌋)
                                                                     
                                                                  
                                                                         
                                                                                            
                                                                                         
                                                                                                           
                                                                                                 
                                                                           2     2                2         2
          random access capabilities.                          respectively. The optimized partitioning for each sub-
                                                               block is computed by a recursive procedure and the La-
          In the 4D Prediction Mode (4DPM) a subset of views is  grangian costs of the four sub-blocks are added to com-
          selected as reference views while the rest of the views  pute the spatial R-D cost    . The block can be further par-
                                                                                       
          are referred to as intermediate views. The texture and  titioned in the view directions, with sub-blocks of sizes
          depth of the reference views are encoded using the JPEG-  definedintheright-handsideofFig.6. Theoptimizedpar-
          2000 standard. The pixel correspondence information  titioning for each sub-block is computed using a recursive
          between the reference views and an intermediate view is  procedure and the Lagrangian costs of the four sub-blocks
          obtained from the depth maps and camera parameters.  are added to compute the view R-D cost    . One should
                                                                                                     
          The pixels of each reference view are warped to the in-  note that if the recursive procedure is expanded to trans-
          termediate view location followed by the prediction stage  form into a non-recursive one, it would be equivalent to a
          where the multiple warped views are merged into a com-  bottom-up optimization of the tree.
          plete view using least-squares sense optimal predictors
          over a set of occlusion-based regions. Being depth-based,  Fig. 7 shows the hierarchical recursive partitioning. The
          the 4DPM can efficiently encode light fields obtained with  algorithm keeps track of this tree, returning a partition-
          a variety of light field imaging technologies such as those  String flag that represents the optimized tree. When the
          obtained with micro-lens based plenoptic cameras and  lowest cost is chosen, the current value of partitionString
          camera arrays.                                       is augmented by appending to it the flag corresponding to
                                                               the lowest cost chosen (Fig. 7: transform Flag, spatialSplit
          The 4DPM can encode light fields very efficiently when re-  Flag or viewSplit Flag). The string returned by the recur-
          liable depth information is available. On the other hand,  sive call that leads to the minimum cost is also appended
          the 4DTM does not need depth information for encod-  to the end of the partitionString, returning both the mini-
          ing, but is efficient only for encoding light fields with very  mum cost    ,    or    and the updated partitionString.
                                                                           
                                                                                  
                                                                             
          high angular view density, such as the ones acquired by
          plenoptic cameras. More details can be found in sec-
          tions 4.2.2 and 4.2.1 of this paper.
          4.2.1  4D-Transform Mode (4DTM)
          The parameterization   (  ,   ,   ,   ) is a 4D simplification of
          the plenoptic function that considers the intensity of each
          light ray constant along its path. Using the two-plane pa-
          rameterization of light fields [15], a sample (pixel) of the
          light field is referenced in a 4D coordinate system along
           90                                © International Telecommunication Union, 2020
   107   108   109   110   111   112   113   114   115   116   117