Page 39 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 39

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020




                         MULTI-VIEWPOINT AND OVERLAYS IN THE MPEG OMAF STANDARD
                                    Igor D.D. Curcio, Kashyap Kammachi Sreedhar, Sujeet S. Mate
                                             Nokia Technologies, Tampere, Finland


          Abstract  –  Recent  developments  in  immersive  media  have  made  possible  the  rise  of  new  multimedia
          applications and services that complement the traditional ones, such as media streaming and conferencing.
          Omnidirectional video (a.k.a. 360-degree video) is enabling one of such new services that are progressively
          made  available  also  by  large  media  distribution  portals  (e.g.,  YouTube).  With  the  aim  of  creating  a
          standardized  solution  for  360-degree  video  streaming,  the  Moving  Picture  Experts  Group  (MPEG)  has
          developed  the  Omnidirectional  MediA  Format  (OMAF)  second  edition,  or  version  2,  which  is  close  to
          completion. The major new features of OMAFv2, compared to the first version, include (but are not limited
          to) the capability of using overlays and multiple omnidirectional cameras situated at different physical
          points  (i.e.,  viewpoints).  This  paper  focuses  on  the  description  of  two  of  the  new  OMAFv2  features,  the
          overlays and the multi-viewpoints, including the 360-degree video use cases enabled by these two features.

          Keywords – Immersive media, MPEG OMAF, multimedia streaming, multi-viewpoints, omnidirectional
          video, overlays.

          1.   INTRODUCTION                                    their  specification  for  omnidirectional  video
                                                               streaming since Release 15 [4]. OMAF defines the
          Immersive media is one of the current buzzwords in   basic storage format as well as the transport over
          media  technologies.  It  refers  to  the  capability  of   Dynamic  Adaptive  Streaming  over  HTTP  (DASH)
          making the user feel immersed in the audio, video,   [23]  and  MPEG  Media  Transport  (MMT)  [24]  for
          and other  media, at the  same time increasing the   audio,  video,  image,  and  timed  text.  Lately,  MPEG
          interactivity level. The Reality-Virtuality continuum   has  been  working  on  the  second  version  of  the
          [1],  allows  a  wide  spectrum  of  immersion  and   OMAF standard [5] with the aim of extending the
          interactivity  levels,  more  towards  the  real     functionalities already enabled by the first version,
          environment or the virtual environment, depending    and make its adoption more appealing for service
          on the actual application or service considered.
                                                               providers and the media industry in general.
          Watching 360-degree videos is one way to consume     The major features specified in OMAFv2 are overlays,
          immersive  media.  360-degree  video  content  is    multi-viewpoints,  sub-pictures  and  new  tiling
          typically played back in a virtual environment using   profiles  for  viewport-dependent  streaming.  This
          a Head Mounted Display (HMD). Whenever the user      paper will focus on the first two ones. Overlays are a
          is enabled to explore the content only by changing   way  to  enhance  the  information  content  of  360-
          the HMD orientation by varying the yaw, pitch and    degree video. They allow us to superimpose another
          roll of the head (i.e., rotational movements), this is   piece of content (e.g., a picture, another video with
          defined as a 3 Degrees of Freedom (3DoF) media [2].   news, advertisements, text  or  other)  on  top of  the
          YouTube  already  offers  omnidirectional  video  in   main (background) omnidirectional video. Overlays
          their portal, and this type of medium is becoming    also allow the creation of interactivity points or areas.
          more  and  more  popular.  If  the  consumer  is  also   The content captured by an omnidirectional capture
          allowed  to  move  in  the  360-degree  space  and   device or an omnidirectional media corresponding to
          navigate, walk, see behind the objects in the scene   one omnidirectional camera is called a viewpoint in
          (i.e.,  translational  movements),  this  is  typically   the OMAFv2 terminology. Multi-viewpoint is a set of
          defined as 6 Degrees of Freedom (6DoF) media [2].
                                                               capture devices which, for example, may be scattered
          The  Moving  Picture  Experts  Group  (MPEG)  has    around a stadium. The OMAFv2 specification enables
          defined  the  first  standard  for  an  Omnidirectional   a streaming format with multiple viewpoints to allow,
          MediA  Format  (OMAF)  [3]  to  enable  the  easy    for  example,  switching  from  one  viewpoint  to
          deployment    of   interoperable   standardized      another,  as  done  by  multi-camera  directors  for
          streaming services for 360-degree video. OMAF is     traditional video productions.
          also  the  basis  of  the  technology  adopted  by  the
          Third  Generation  Partnership  Project  (3GPP)  in





                                                © International Telecommunication Union, 2020                 17
   34   35   36   37   38   39   40   41   42   43   44