Page 60 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 60

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020



          not shown where they should be. They might look      the system gave realistic sensations to a fair number
          either higher (like the objects are floating) or lower   of  people.  Similarly,  the  positive  responses  (16)
          (sunk  into  the  floor)  than  the  ideal  positions.   were four times as many as the negative responses
          Therefore,  the  positions  of  the  objects  (object   (4).  This shows the efficacy of the system. There are
          images produced by the real-time object extraction)   2  responses  categorized  as  “Not  stereoscopic
          are transformed by viewpoint transform for Kirari!   enough.” It is natural to have a certain number of
          for Arena [34].                                      such  responses  because  the  display  is  2D  in
                                                               principle.  Providing  a  stronger  sense  of  depth
                                                               would be future work.
                                                                            Table 1 – Audience responses

                                                                  Stance     Category   Quantity  Subtotal  Total

                                                                          Useful           10
           Fig. 8 – Object position perceived by a viewpoint different   Positive   Realistic   5   16
                            from a camera                                 Low latency      1              20
          The  parameters  of  the  transform  are  acquired  in          Not stereoscopic   2
          advance from the physical positions of the cameras     Negative  enough          2        4
                                                                          Has latency
          and their calibration with the performing stage.
                                                               Table  2  shows  some  examples  of  the  responses
          The position of the object reconstructed with the 3D
          position information was more stable than previous   categorized  as  “Useful”  and  “Realistic.”  They  are
          studies [35][36] using background subtraction and    shown because these categories have a wide variety
          cropped input image.                                 of responses. The “Realistic” responses show that
                                                               the  performer  looked  as  if  he  was  really  in  the
          5.   EVALUATION AND DISCUSSION                       device.  Such  realistic  sensations  were  stimulated
                                                               because  the  image  from  the  source  site  was
          Kirari! for Arena was demonstrated at an exhibition   effectively  fused  into  the  surroundings  in  the
          in  December  2018.  There  were  more  than  200    display device at the remote site by combining real-
          exhibits and the total guest count was more than     time image extraction and depth expression.  These
          14,000 people.
                                                               responses could confirm that the atmosphere was
          Commercially  available  PCs  (specifications  varied   reconstructed by the system.
          up to two 22-core 2.2-GHz CPUs, 32 GB RAM, GPU)                  Table 2 – Examples of responses
          were used to construct the Kirari! for Arena system.
          The system could process 4K (3840 x 2160 pixels)       Category  Responses
          images with a frame rate of 59.94 fps.

          Evaluating experience is quite a problematic task,     Useful    It would be great if we have this in our home.
          but  audiences  of  such  experiences  provide         Useful    It would be good for streaming of live
          responses that can be used to estimate an objective              concert.
          evaluation  of  the  system.  A  total  of  98  relevant   Useful   I want to watch Sumo on this.
          responses  interviewed  by  the  exhibitors  were      Realistic   Looks as if he’s really there.
          collected  through  the  exhibition,  and  20  of  them   Realistic   It’s too real it’s scary.
          were  useful  for  evaluating  the  system.  First,  the   5.1  Conformance to ITU-T H.430 series (ILE)
          responses  were  sorted  into  positive  and  negative     of standards
          stances.  Then,  they  were  further  categorized  into
          smaller categories for each stance. Table 1 shows    Kirari! for Arena conforms to the ITU-T standards
          the categorization results.                          by fulfilling all of the mandatory requirements.
          As  these  are  free  comment  responses,  the  5    Table 3 summarizes the requirements specified in
          categorized as “Realistic” does not necessarily mean   ITU-T H.430.1 and the conformance status of Kirari!
          the  remaining  15  did  not  evaluate  the  system  as   for Arena. The numbers in the column “Kirari! for
          “Realistic.” On the contrary, the fact that at least 5   Arena”  indicates  the  subclauses  in  this  article
          people  out  of  20  actively  expressed  their      describing the functions and evaluation results that
          impression that the display was “Realistic” means    fulfill  the  requirements.  Although  wave  field
                                                               synthesis technology with a speaker array was not




          38                                    © International Telecommunication Union, 2020
   55   56   57   58   59   60   61   62   63   64   65