Page 36 - Kaleidoscope Academic Conference Proceedings 2021
P. 36

2.2   Automatic concept acquisition                patient’s body and the transmitter sends those sounds to a
                                                              remotely located  receiver [3]. A  doctor can  operate the
           A concept is a mental representation of a category stored in   receiver to listen to or record sounds coming from various
           memory. It is a set of information that a category points to,   parts  of the  patient’s  body. The current prototype  has
           and it consists of what is known about that category. For   eighteen acoustic channels and one Electrocardiogram (ECG)
           example,  the concept of  “cats” that humans  hold is  not   channel. In addition to the frequency range commonly used
           limited to the shape or form of cats. It is rather an integrated   in conventional auscultation, this system  is designed to
           abstraction of various aspects of cats, such as the sounds they   capture  other frequency bands to acquire richer
           emit (meowing, etc.), their behavior, the feel of their fur, and   multidimensional information about the patient.
           the language  used to express such aspects. Therefore,  a
           concept can be acquired by observing a set of attributes in
           the same category through  different types  of media
           information or modalities, and it  can be understood as  an
           abstract form independent of those different modalities. The
           abstract form is thus expressed as coordinates in a common
           conceptual space. Toward  Artificial Intelligence (AI) that
           understands concepts like humans, cross-media processing
           technologies based on deep learning are used to develop a
           system that autonomously acquires concepts without having
           to be trained with correct answers. By focusing on the fact
           that different types of media information originating from
           the same thing appear  not in a  random manner  but  with
           specific relationships, the  system autonomously learns
           concepts through the co-occurrence  of different types of       Figure 3 – Tele-stethoscope
           media information. Figure 2 shows that the proposed system
           is trained with a data set consisting of English, Hindi, and   The cross-media processing technologies will bring about
           Japanese speech captions for a common image set. The   “AI-auscultation” that analyzes and diagnoses a  patient’s
           extracted features of  the related  image and sounds  are   internal  physical condition  by listening to  body sounds
           learned to be located closer together than unrelated ones in   captured  by the tele-stethoscope. For example, a  method
           the  common embedding  space. As a result,  the  system   based  on a conditional sequence-to-sequence caption
           automatically  acquires translation knowledge between the   generator  can automatically convert  the body sounds  into
           three languages using images as pivots. [2].       sentences that describe the  sounds’  origins such as “The
                                                              heart sounds are abnormal. There may be a problem with one
                                                              of the heart valves.” The amount of information contained in
                                                              the output can be controlled by a “specificity” parameter.
                                                              Visualization  of  physical states generated from the  body
                                                              sounds is also being developed as another example of AI-
                                                              auscultation.

                                                               3.  LANGUAGE ACQUISITION IN INFANTS AND
                                                                                  CHILDREN

                                                              Do human infants autonomously learn from the co-
                                                              occurrence of phenomena in the natural world? For infants,
                                                              communication is an important means of recognizing objects
                                                              and promoting the acquisition of knowledge, concepts, and
           Figure 2 – Cross-lingual translation knowledge acquisition   vocabulary. Infants accumulate various types of knowledge
                                                              from information  obtained  from the surrounding
           2.3   AI in auscultation                           environment, such as by listening to a parent’s conversation,
                                                              or speech from a television. For example, they learn groups
           The cross-media processing technologies are also used to   of syllables that co-occur  with high frequency as words
           make a stethoscope more intelligent. A “tele-stethoscope” is   based on statistical learning. However, this does not mean
           a wearable acoustic sensor  array system for listening to   that the infant indiscriminately processes a huge amount of
           sounds from various parts of a human body using multiple   information. It is found that learning in infants is promoted
           acoustic sensors and  remotely transmitting collected   by explicit communication signals or ostensive cues from a
           acoustical signals over a wireless network. The developed   parent such as utterances directed toward the infant rather
           system at NTT consists of an examination vest worn by the   than cues that only attract the infant’s attention (attentional
           patient and equipped with multichannel acoustic sensors, a   cues), such as shivering and a beep [4]. The infant uses such
           transmitter, and a receiver as shown in Figure 3. The acoustic   communication signals as  a learning cue to  focus
           sensors  on the examination  vest collect sounds from the   appropriately on learning targets and sort out what to learn




                                                          – xxxii –
   31   32   33   34   35   36   37   38   39   40   41