Page 38 - Kaleidoscope Academic Conference Proceedings 2021
P. 38

functions [12]. Humans are able to imagine a person’s voice
                                                              from the  person’s appearance and imagine the person’s
                                                              appearance  from his/her voice. A  cross-modal voice
                                                              conversion model, consisting of a speech converter, a face
                                                              encoder/decoder, and a voice encoder, converts an input
                                                              speech (a voice) into a different voice that matches an input
                                                              face image as well as generates a face image that matches the
                                                              voice  of the input speech  by leveraging the correlation
                                                              between faces and voices.

                                                                 6.  AUGUMENTING THE REAL WORLD BY
                   Figure 6 – Narikiri AI Kyomachi Seika                          ILLUSIONS

                   5.  CREATING NEW FORMS OF                  Studying illusions is important  because they provide
                           COMMUNICATION                      important  clues  to understanding  the brain functions of
                                                              humans. It is also important because illusion-based
           The famous slogan “Reach out and touch someone” was used   technologies can augment the real world and deliver heart-
           by AT&T, the American telecommunications company, in   touching experiences to  users.  For example, a new light
           its TV commercials of the  1970s. The “Mega-futuristic   projection technique named Hengento (deformation lamps)
           experiential  public telephones” shown in Figure  7  were   was  proposed to add a  variety  of realistic movement
           proposed in  2018 for sharing the sense  of touch, as if   impressions to a static color object [13].
           bringing the  slogan into reality. They  are touch-based
           communication systems in which pressing the push buttons
           of a telephone causes a  variety  of tactile sensations to
           stimulate the other party’s body [9]. The “Tactile TV” was
           proposed as a device for receiving audiovisual content with
           vibrotactile signals [10]. More recently, new systems such as
           Remote  High Five and Public Booth  for  Vibrotactile
           Communication that share tactile sensations beyond distance
           have  been  proposed. The  standardization of haptics in
           multimedia systems has  been discussed with  the
           International  Electrotechnical Commission  (IEC) so that
           content  providers can transmit vibrotactile-assisted
           audiovisual content via a network. The vibrotactile channels       Figure 8 – Hengento
           are defined as additional channels  of audio stream in the
           High-Definition Multimedia Interface (HDMI) in which the   In the  Hengento technique,  a  grayscale moving  pattern is
           Tactile TV is  referenced as  one  of a number  of potential   projected to a static color object such as a printed picture or
           applications [11]. The  vibrotactile signals may be   photograph, thus making the object appear to be moving. The
           compressed by lossless audio coding such as MPEG-4 ALS,   moving pattern is a series of  dynamic luminance signals
           which does not employ a psychoacoustic model.      whose  shape,  intensity, and  frequency correspond to the
                                                              shape, color intensity, and movement of the target object.
                                                              Figure 9 shows the mechanism of Hengento. Whenever a
                                                              person watches a movie or TV, the brain analyzes an input
                                                              image separately for color, form, and motion components,
                                                              and then binds them together into a coherent visual
                                                              representation.








            Figure 7 – Mega-futuristic experiential public telephones
                             (Nos. 3 and 4)

           Speech conversion technologies modify speech signals to
           one’s desired form of expression  for transmitting and       Figure 9 – Mechanism of Hengento
           receiving  without altering their spoken  words’ content.
           These technologies are expected to create new forms  of   When the person views the pattern produced by Hengento,
           communication that extend human vocal and auditory   the brain receives color and form information from the static




                                                         – xxxiv –
   33   34   35   36   37   38   39   40   41   42   43