Page 138 - Kaleidoscope Academic Conference Proceedings 2024
P. 138

2024 ITU Kaleidoscope Academic Conference




           In line with the ITU-T FG-AI4H [6] and Recommendation   and addressing privacy apprehensions. The AEDS platform
           Y.4220 [7], the proposed work focuses on developing AI-  assumes a pivotal role in integrating Internet of Things (IoT)
           driven  solutions  to  enhance  fall  risk  assessment  and  devices with emergency contacts and medical practitioners.
           prevention among elderly. By leveraging machine learning
           techniques  and  data  fusion,  models  are  developed  that     3.  PROPOSED SYSTEM
           integrate  multi-modal  data  sources  to  enable  standardized
                                                              The  proposed  system  comprises  two  interconnected
           multifactorial assessment of health risk factors. Through a
                                                              modules, the Real-Time Voice Emotion Recognition System
           comprehensive evaluation of the implemented modules and
                                                              and the Video-Based Anomaly Detection System. The voice
           testing   scenarios,  this  research  demonstrates  the  emotion  recognition  module  utilizes  a  deep  learning
           effectiveness  and  feasibility  of  our  approach  in  providing
                                                              technique  namely  Convolutional  Neural  Network-Long
           comprehensive care for the elderly population. Our proposed  Short-Term Memory (CNN-LSTM) and a diverse range of
           work  not  only  considers  multi-dimensional  input  but  also
                                                              datasets (in training/testing system) to accurately detect and
           works on low power edge devices (e.g., Raspberry pi-based
                                                              interpret health emotional states in real-time. Concurrently,
           system)  while  producing  better  precision  score.    By
                                                              the  video-based  anomaly  detection  module  employs  a
           advancing  AI-driven  healthcare  systems,  our  work  lightweight  deep  neural  network  called  MobileNet  for
           contributes to improving the quality of life and promoting
                                                              embedded  systems  to  identify  abnormal  behaviors  and
           independent  living  among  elderly  individuals,  addressing  trigger timely interventions. As illustrated in Figure 1, these
           critical needs in eldercare, and fostering societal well-being.
                                                              modules  together  provide  a  robust  architecture  for
                                                              continuous  monitoring,  personalized  support,  and  prompt
           The  rest  of  the  paper  is  organized  as  follows.  Section  2
                                                              response,  in  a  home  environment  for  elderly  individuals
           outlines the related works and standards. Section 3 presents
                                                              living independently.
           the  architectural  details  and  algorithm  of  the  proposed
           system,  and  Section  4  describes  the  implementation  and   At its core is the smart health device, serving as the central
                                                              hub  for  data  processing  and  decision-making.  The  voice
           experimentation.  Results  and  discussions  are  presented  in
           Section 5, and Section 6 provides concluding remarks with   signal  captured  through  microphones  from  the  user’s
                                                              premise initiates the process, providing real-time emotional
           major findings of the proposed work.
                                                              cues for analysis. This input undergoes preprocessing and
                                                              chunk  formation  to  facilitate  efficient  processing  and
                         2.  RELATED WORKS
                                                              analysis.  Subsequently,  the  Acoustic  Feature  Extraction
                                                              Module  extracts  relevant  features  from  the  voice  input,
           The existing works [8-9] on elderly personalized healthcare
           cover abnormal behaviors such as falls, tumbles, aggression,   capturing  nuances  essential  for  health  state  recognition.
                                                              Following  feature  extraction,  dimensionality  reduction
           and wandering, based on wearable sensors or the video feed
           captured on consumer networked cameras and processed on   techniques  are  applied  to  streamline  the  data,  enhancing
                                                              computational efficiency without sacrificing accuracy. The
           a local GPU server. In machine learning based techniques,
           the attention mechanisms and advanced models enhance the   processed  voice  data  then  enters  the  voice-based  state
                                                              recognition model CNN-LSTM, which is a deep learning-
           accuracy  of  anomaly  detection,  particularly  in  identifying
           critical events like falls, thereby mitigating the risk of false   based algorithm trained to interpret health states. Predicted
                                                              health states are generated, indicating the severity of detected
           alarms,  or  overlooked  health  concerns.  However,  the
           standardized  approaches  in  system  development  require   emotions.  If  the  severity  is  low,  the  system  continues  to
                                                              monitor  voice  input,  maintaining  vigilance  for  any
           interoperability with home networks, IP networks, and health
           service  provider  network  while  meeting  the  critical   significant changes. However, if a high-severity emotion is
                                                              detected,  the  system  triggers  the  video  capturing  module,
           requirements of healthcare.
                                                              transitioning  to  visual  analysis  for  obtaining  additional
                                                              context.
           The ITU Focus Group on Artificial Intelligence for Health
           (FG-AI4H) Topic Group on Falls Amongst the Elderly (TG-  Upon activation, the video input undergoes preprocessing to
           Falls)  [6]  addresses  the  critical  issue  of  preventing  falls   enhance  image  quality  and  reduce  noise,  preparing  it  for
           among elderly, a common health problem with significant   analysis.  The  Person  Localization  Model  identifies  and
           repercussions. Falls among community-dwelling adults aged   tracks individuals within the video feed, facilitating targeted
           65  years  or  older  account  for  a  substantial  portion  of   analysis.   Concurrently,   the   Voice-based   Emotion
           hospitalizations  and  lead  to  loss  of  independence.  AI   Recognition  Model  operates  in  tandem,  correlating  voice
           techniques offer a promising solution by generating models   cues with visual observations to refine emotion predictions.
           that  combine  various  data  sources,  enabling  standardized   The overall algorithm of the proposed system is described in
           multifactorial assessment of fall risk factors and facilitating   Algorithm  1.  The  YOLO-based  segmentation  and
           their implementation in clinical practice. Recommendation   localization on video feed is used to continuously monitor
           ITU-T Y.4220 [7] delineates the criteria and functionalities   health-related  activities  over  extended  periods,  providing
           essential for an Abnormal Event Detection System (AEDS)   valuable insights into patient behavior, movement patterns,
           implemented within smart homes, with a primary focus on   and activity levels. This information aids in assessing patient
           identifying health-related occurrences such as falls or strokes.  well-being  and  identifying  any  deviations  from  normal
           Central  to  its  directives  is  the  establishment  of  standards   behavior. The MobileNet-based enhanced transfer learning
           aimed at enhancing system efficacy, minimizing false alarms,  model is used on video data to classify postures (e.g.,





                                                           – 94 –
   133   134   135   136   137   138   139   140   141   142   143