Page 40 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 40

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020



          Overlays have been researched in the past years in   The structure of the remaining parts of this paper is
          several application areas but, to the best of the authors’   as  follows.  Section  2  shows  the  OMAF  system
          knowledge, minimally in the area of 360-degree video.   architecture.  Section  3  focuses  on  the  overlay
          In [6], the authors present an image overlay system to   capabilities  and  functionalities  in  OMAFv2.
          aid  procedures  in  computerized  tomography.  The   Section 4  describes  how  multiple  omnidirectional
          system in [7] shows a way to display dynamic image   viewpoints  can  be  utilized  in  OMAFv2.  Finally,
          overlays  during  surgical  operations  using  a  stereo   section 5 concludes the paper.
          camera  and  augmented  reality  visualization.  The
          authors of [8] describe a method for real-time overlay   2.   MPEG OMAF SYSTEM ARCHITECTURE
          insertion  in  a  predefined  location  of  a  pre-encoded   This  section  introduces  the  general  MPEG  OMAF
          ITU-T H.264/AVC video sequence. The detection and    system architecture. This is depicted in Fig. 1, which
          extraction of an overlaid text on top of a complex video   is  extracted  from  the  draft  OMAFv2  standard
          scene  and  news  are  studied  in  [9,  10].  Augmented   specification [5]. The figure shows the end-to-end
          reality-based image overlays on optical see-through   content  flow  process  from  acquisition  up  to
          displays mounted on the front glass of a car have been   display/playback for live and on-demand streaming
          studied  by  [11].  A  similar  research,  but  performed   use  cases.  The  specification  applies  to  projected
          using a Virtual Reality (VR) HMD with a video-based   omnidirectional  video  (equirectangular  and  cube
          see-through  display,  is  presented  in  [12].  Here  the   map) as well as to fisheye video. It defines media
          authors use an accelerometer to compensate for the   storage  and  metadata  signaling  in  the  ISO  Base
          motion-to-photon  delay  between  the  image  overlay   Media File Format (ISOBMFF) [25] (i.e., interfaces F
          and the reality displayed on the HMD’s screen, and   and F' in Fig. 1). It also defines media encapsulation
          give a method to improve the registration among the   and signaling in DASH and MMT.
          two.  An  implementation  of  a  system  that  displays
          overlays in unmanned aerial vehicles is presented in   OMAF specifies also audio, video, image and timed
          [13].  An  intelligent  video  overlay  system  for   text media profiles, i.e., the interfaces E'a, E'v, E'i. All
          advertisements is presented in [14]. Here the overlays   other interfaces depicted in the figure above are not
          are not placed in fixed positions, but are located such   normatively  specified.  Additionally,  OMAF  defines
          that the intrusiveness to the user is minimized and is   different  presentation  profiles  for  viewport-
          done by detecting faces, text and salient areas in the   independent  and  viewport-dependent  streaming.
          video.                                               For  further  details  on  these  two  concepts,  the
                                                               reader may refer to [22].
          Multi-viewpoint  360-degree  video  streaming  is  a
          relatively new area. For traditional mobile 2D video,   Following Fig. 1, media content is initially captured.
          multi-camera video remixing has been extensively     Audio is encoded using 3D Audio with the MPEG-H
          researched by some of the authors of this paper. See   [26] audio low complexity Profile at level 1/2/3 or
          [15, 16, 17], for example. The work [18] presents    the MPEG-4 High Efficiency AACv2 at Level 4 codec
          streaming  from  multiple  360-degree  viewpoints    [27]. Visual content is first stitched, possibly rotated,
          where  these  are  capturing  the  same  scene  from   projected and packed. Subsequently, it is encoded
          different  angles.  A  challenge  described  by  the   using the MPEG High Efficiency Video Codec (HEVC)
          authors  is  viewpoint  switching  and  how  to      Main  10  profile  at  Level  5.1  [28]  or  the  MPEG
          minimize disruption after a switch. The authors also   Advanced  Video  Codec  (AVC)  Progressive/High
          emphasize the importance of switching prediction     profile at Level 5.1 [29]. Images are encoded using
          in order to minimize the impact on the Quality of    the HEVC image profile Main 10 at Level 5.1 or Joint
          Experience (QoE). The research in [19] focuses on    Pictures  Experct  Group  (JPEG)  images  [30].  The
          low latency multi-viewpoint 360-degree interactive   encoded streams are then placed into an ISOBMFF
          video. The authors use multimodal learning and a     file  for  storage  or  encapsulated  into  media
          deep  reinforcement  learning  technique  to  detect   segments  for  streaming.  The  segments  are
          events  (visual,  audio,  text)  and  predict  future   delivered  to  the  receiver  via  the  DASH  or  MMT
          bandwidths, head rotation and viewpoint selection    protocols. At the receiver side (player), the media is
          for improving media quality and reducing latency.    decapsulated, and then decoded with the respective
                                                               decoder(s), and subsequently rendered on a display
          The  present  paper  focuses  on  two  of  the  new  main   (e.g.,  a  HMD)  or  loudspeakers.  The  head/eye
          features included in the second edition of the MPEG   tracking     orientation/viewport      metadata
          OMAF standard, namely overlays and multi-viewpoints.   determine the  user  viewing  orientation  within
          The new enabled use cases are also introduced.




          18                                    © International Telecommunication Union, 2020
   35   36   37   38   39   40   41   42   43   44   45