Page 114 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 114

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020



























                                      Fig. 7 – 4DTM 4D-block hierarchical spatial and view partitioning.


          bit-rate applications, one may choose to use fewer refer-  the block-based approaches used in the multi-view exten-
          ence views compared to high bit-rate applications.   sions of HEVC, namely MV-HEVC and 3D-HEVC. The oc-
                                                               clusion classes provide a localized prediction for each of
          For a given intermediate view the texture prediction of  the regions, with each region having an arbitrary contour
          4DPM consists of two sparse prediction stages. In the first  based on depth features. Therefore, the contours of the
          stage, the depth information is used to warp the pixels  regions follow the borders of different depth levels in the
          of the reference views to their corresponding locations  scene, and this makes the predictors to be specifically de-
          in the camera position of the target intermediate view.  signed for different objects or group of objects. The latter
          Each warped reference view produces a distinct pattern  adjustment of the predicted view using a 2D convolution
          of disoccluded pixels. The aggregation of this information  with an adaptively chosen filter template provides an ad-
          over all warped reference views can be used to infer a use-  justment of the predicted view using global features of the
          ful segmentation of the predicted intermediate view. The  image (such as global illumination conditions).
          segmentation used in 4DPM is generated by the split of
          the set of pixels into various occlusion classes. Each oc-  The entire texture and depth components of any interme-
          clusion class defines a specific subset of warped reference  diate view are reconstructed using the previously men-
          views. These subsets are used to obtain the optimal lin-  tioned prediction stages. The view prediction stage of the
          ear predictor the least-squares sense involving only the  texture component is optionally followed by coding of the
          relevant warped reference views at each pixel location of  view prediction error. When the application requires low
          the predicted view. The occlusion classes are specially de-  distortion of   , the view prediction error is usually en-
          signed to handle the depth-based prediction of wide base-  coded at every   (  ,   ) > 1, while for applications target-
          line light fields where disocclusions in warped reference  ing low rates and moderate distortion, the view predic-
          views need to be considered during view merging.     tion error may not be encoded at all.
          The view merging stage is followed by inpainting, after  4.2.3  Part 2 Performance Evaluation
          which the second sparse prediction stage is applied, in-
          volving a 2D convolution with a spatially sparse filter, hav-  The coding tools described in Section 4.2 are currently im-
          ing the template chosen adaptively by a greedy sparse fil-  plemented as a verification model software while design
          ter design. The intermediate-depth views are also syn-  activities for the reference software are ongoing. Differ-
          thesized using a depth-based view warping algorithm, al-  ent technologies, initially proposed as a response to the
          lowing each already processed hierarchical level to act as  call for proposal for the JPEG Pleno standardization, have
          a source of reference views for subsequent hierarchical  gone through several exploration studies and core exper-
          levels. Since each hierarchical level contributes a set of  iments aiming at critically evaluating the coding perfor-
          reference views for the prediction of higher hierarchical  mance against state-of-the-art encoders and determine
          levels, the views on the higher hierarchical levels become  actions for the performance and quality improvement of
          more efficiently coded compared to the lower hierarchical  the JPEG Pleno coding framework. Details about the ex-
          levels.                                              perimental analysis, obtained results, and discussions are
                                                               reported in several research papers [13], [17].
          The 4DPM uses sample-based warping and merging op-
          erations, which are very efficient when the disparity in-  The JPEG Committee provided a Common Test Condi-
          formation is precise enough, differing in this respect from  tions (CTC) document [14], as well as a publicly available





           92                                © International Telecommunication Union, 2020
   109   110   111   112   113   114   115   116   117   118   119