Page 38 - Kaleidoscope Academic Conference Proceedings 2021

P. 38

functions [12]. Humans are able to imagine a person’s voice
from the person’s appearance and imagine the person’s
appearance from his/her voice. A cross-modal voice
conversion model, consisting of a speech converter, a face
encoder/decoder, and a voice encoder, converts an input
speech (a voice) into a different voice that matches an input
face image as well as generates a face image that matches the
voice of the input speech by leveraging the correlation
between faces and voices.

6. AUGUMENTING THE REAL WORLD BY
Figure 6 – Narikiri AI Kyomachi Seika ILLUSIONS

5. CREATING NEW FORMS OF Studying illusions is important because they provide
COMMUNICATION important clues to understanding the brain functions of
humans. It is also important because illusion-based
The famous slogan “Reach out and touch someone” was used technologies can augment the real world and deliver heart-
by AT&T, the American telecommunications company, in touching experiences to users. For example, a new light
its TV commercials of the 1970s. The “Mega-futuristic projection technique named Hengento (deformation lamps)
experiential public telephones” shown in Figure 7 were was proposed to add a variety of realistic movement
proposed in 2018 for sharing the sense of touch, as if impressions to a static color object [13].
bringing the slogan into reality. They are touch-based
communication systems in which pressing the push buttons
of a telephone causes a variety of tactile sensations to
stimulate the other party’s body [9]. The “Tactile TV” was
proposed as a device for receiving audiovisual content with
vibrotactile signals [10]. More recently, new systems such as
Remote High Five and Public Booth for Vibrotactile
Communication that share tactile sensations beyond distance
have been proposed. The standardization of haptics in
multimedia systems has been discussed with the
International Electrotechnical Commission (IEC) so that
content providers can transmit vibrotactile-assisted
audiovisual content via a network. The vibrotactile channels Figure 8 – Hengento
are defined as additional channels of audio stream in the
High-Definition Multimedia Interface (HDMI) in which the In the Hengento technique, a grayscale moving pattern is
Tactile TV is referenced as one of a number of potential projected to a static color object such as a printed picture or
applications [11]. The vibrotactile signals may be photograph, thus making the object appear to be moving. The
compressed by lossless audio coding such as MPEG-4 ALS, moving pattern is a series of dynamic luminance signals
which does not employ a psychoacoustic model. whose shape, intensity, and frequency correspond to the
shape, color intensity, and movement of the target object.
Figure 9 shows the mechanism of Hengento. Whenever a
person watches a movie or TV, the brain analyzes an input
image separately for color, form, and motion components,
and then binds them together into a coherent visual
representation.

Figure 7 – Mega-futuristic experiential public telephones
(Nos. 3 and 4)

Speech conversion technologies modify speech signals to
one’s desired form of expression for transmitting and Figure 9 – Mechanism of Hengento
receiving without altering their spoken words’ content.
These technologies are expected to create new forms of When the person views the pattern produced by Hengento,
communication that extend human vocal and auditory the brain receives color and form information from the static

– xxxiv –

33 34 35 36 37 38 39 40 41 42 43