Page 141 - Kaleidoscope Academic Conference Proceedings 2024
P. 141

Innovation and Digital Transformation for a Sustainable World




           performance  on  a  validation  dataset  ceases  to  improve,   swift processing of video frames in real-time, maintaining
           thereby  preventing  unnecessary  computation  and  ensuring   high performance standards. Trained to categorize postures
           efficient convergence to the optimal solution. This approach   into three distinct states namely, standing, sitting, or fallen
           enables the system to strike a balance between maximizing   down as shown in Figure 3. The model assumes a critical role
           training accuracy and generalizing well to unseen data. By   in monitoring the individual's condition and swiftly detecting
           monitoring performance metrics across epochs and halting   potential emergencies, particularly instances of falls.
           training  when  further  improvements  are  deemed  unlikely,
           early  stopping  enhances  the  system's  robustness  and   The  accurate  classification  of  posture  serves  as  a
           efficiency in emotion recognition tasks.           fundamental  indicator  of  the  individual's  well-being  and
                                                              safety. By adeptly discerning between  various postures in
           During development, the system captures audio input from a   real-time,  the  system  enables  proactive  intervention  in
           connected microphone and segments it into short frames for   critical situations such as falls. This proactive approach is
           analysis. By leveraging lightweight model architecture and   particularly valuable for elderly individuals who may be at
           feature  extraction  techniques,  the  system  efficiently   increased risk of accidents. Moreover, the MobileNet-based
           processes these audio frames, extracting pertinent features   enhanced  transfer  learning  model's  adaptability  and
           and making emotion predictions with minimal latency. The   generalization  capabilities  helps  to  deal  with  diverse
           integration with the Raspberry Pi's hardware ensures smooth   environments  and  individuals.  By  leveraging  pre-existing
           operation,   effectively   meeting   stringent   latency   knowledge  from  extensive  datasets,  the  model  effectively
           requirements. The hardware experimental setup for the code   recognizes  subtle  variations  in  posture  and  movement
           development is shown in Figure 2.                  patterns,  even  under  challenging  circumstances.  The
                                                              incorporation  of  MobileNet-based  posture  detection
                                                              enhances  the  system's  responsiveness  and  effectiveness  in
                                                              multi-modal environment.

                                                              In pursuit of refining the model's efficacy in identifying falls,
                                                              advanced  concepts  of  attention  and  hourglass  layers  are
                                                              seamlessly  integrated.  Attention  mechanisms  assume  a
                                                              pivotal role by dynamically assigning weights to different
                                                              features within the input sequence. This dynamic allocation
                                                              enables the model to prioritize pertinent cues indicative of
                                                              fall events, thereby bolstering the accuracy of fall detection.
                                                              By  amplifying  the  importance  of  features  associated  with
                                                              falling, such as sudden alterations in acceleration patterns or
                                                              specific  movement  traits,  the  attention  model  effectively
            Figure 2 – Experimental setup for system development   distinguishes  falls  from  other  activities  or  gestures,
                                                              mitigating false alarms and facilitating prompt interventions.
           4.2   Video-based fall detection                   Furthermore,  the  hourglass  model,  originally  devised  for
                                                              human  pose  estimation  tasks,  contributes  significantly  to
           The  video  module  of  the  Elderly  Wellness  Companion   accurately pinpointing the person's body joints or key points
           system plays a critical role in ensuring timely intervention   within  the  video  frames.  Leveraging  an  encoder-decoder
           and support for elderly individuals, particularly in scenarios   architecture,  the  hourglass  model  adeptly  captures  multi-
           where anomalous emotions are detected through the voice   scale features while preserving spatial information, enabling
           module. Upon detection of such anomalies, the video module   precise  estimation  of  the  person's  posture.  Through
           is  triggered,  activating  the  networked  camera  to  capture   systematic down sampling and subsequent up sampling of
           video footage for further analysis. It's important to note that   the  input  data,  spatial  context  is  meticulously  maintained
           the camera is only enabled when necessary, ensuring privacy   throughout  the  encoding  process.  This  fidelity  to  spatial
           and minimizing intrusion into the user's space.    details  is  further  reinforced  by  skip  connections  between
                                                              encoder  stages,  ensuring  the  propagation  of  fine-grained
           Once the video footage is captured, it is processed to extract   information crucial for posture localization.
           relevant information about the person's posture and activity.
           The  first  step  involves  segmenting  the  video  frames  to   The symbiotic integration of attention and hourglass models
           identify  the  person's  location  using  YOLO  [14]  based   empowers  our  system  with  exceptional  precision  in
           segmentation.  This  segmentation  technique  efficiently   identifying falls and monitoring the individual's posture. By
           isolates  the  individual  from  the  background,  facilitating   meticulously scrutinizing the temporal and spatial attributes
           accurate analysis of their movements and posture.   of video frames, the model swiftly detects anomalous events
                                                              and  triggers  appropriate  responses,  such  as  alerting
           Upon  segmenting  the  individual  in  the  video  frames,  the   caregivers or initiating emergency protocols. This proactive
           system  integrates  a  MobileNet-based  enhanced  transfer   stance not only enhances the safety and well-being of elderly
           learning model to analyze the person's posture. Renowned   individuals but also instills confidence and assurance among
           for its lightweight and efficient design, MobileNet ensures   users  and  their  families.  When  visual  data  (Figure  4)  is





                                                           – 97 –
   136   137   138   139   140   141   142   143   144   145   146