Page 139 - Kaleidoscope Academic Conference Proceedings 2024
P. 139

Innovation and Digital Transformation for a Sustainable World





                                                        Acoustic Feature                      Emotion / Health
                                                          Extraction                          State Recognition
               Smart                                        MFCC                                CNN - LSTM

               Health       Voice      Preprocessing        ZCR                                 Deep Neural

               Device       Input        & Chunks                              Feature            Network
                                        Formation           RMS
                                                                             Optimisation







              Audio / Video                                                                      Predicted
               Triggering                                            Severity                   Health State
                Module                                              Estimation






                                                                     Posture
               Video                                               Recognition
                Input            Pre -            Person                                          Predicted
                              Processing        Localization        Attention                     Posture

                                                                      based
                                                                    Enhanced

                                                                     Transfer

                                                                     Learning
                                                                     Model




                                       Alert Linked            Update Report
                                    Healthcare Provider         in Database                     Fall Detection








                      Figure 1 - Proposed system architecture of voice and video-based health anomaly detection
                                                                 2.  Load  pre-trained  models  for  voice  emotion
           Standing, Sitting, or Fallen) and further analysis. Predicted
                                                                     recognition and posture classification.
           postures are evaluated for signs of potential health issues,
           enabling  timely  anomaly  detection  and  intervention.  The   3.  Preprocess  voice  input,  segmenting  it  into  short
                                                                     frames.
           developed  algorithm  is  tailored  to  run  on  an  edge  device
           connected to an IP network system. A Raspberry Pi-based   4.  Extract acoustic features from audio.
                                                                 5.  Utilize  CNN  –  LSTM  Deep  Neural  Network  to
           single-board  computer  is  used  to  implement  and  test  the
           proposed  system  architecture  with  associated  machine   detect emotional states.
                                                                 6.  Check for severity in detected emotional states.
           learning algorithms.
                                                                 7.  Activate  video  module  upon  detection  of  severe
                                                                     anomalous emotional states.
           Algorithm 1 – Elderly Health Anomaly Detection
                                                                 8.  Segment video frames using YOLO segmentation
           algorithm
                                                                     to isolate person from background.

                                                                 9.  Employ  MobileNet-based  enhanced  transfer
           Input: Real-time voice input, Video feed
                                                                     learning  model  to  classify  postures  (Standing,
           Output:  Detection  and  Reporting  of  Elderly  Health
                                                                     Sitting, or Fallen).
           Anomalies
              1.  Initialize microphone and camera sensors for real-  10.  Ascertain for high-severity emotions (e.g., a fall),
                                                                     and trigger alerts.
                  time voice and video input.

                                                           – 95 –
   134   135   136   137   138   139   140   141   142   143   144