Page 138 - Kaleidoscope Academic Conference Proceedings 2024

P. 138

2024 ITU Kaleidoscope Academic Conference

In line with the ITU-T FG-AI4H [6] and Recommendation and addressing privacy apprehensions. The AEDS platform
Y.4220 [7], the proposed work focuses on developing AI- assumes a pivotal role in integrating Internet of Things (IoT)
driven solutions to enhance fall risk assessment and devices with emergency contacts and medical practitioners.
prevention among elderly. By leveraging machine learning
techniques and data fusion, models are developed that 3. PROPOSED SYSTEM
integrate multi-modal data sources to enable standardized
The proposed system comprises two interconnected
multifactorial assessment of health risk factors. Through a
modules, the Real-Time Voice Emotion Recognition System
comprehensive evaluation of the implemented modules and
and the Video-Based Anomaly Detection System. The voice
testing scenarios, this research demonstrates the emotion recognition module utilizes a deep learning
effectiveness and feasibility of our approach in providing
technique namely Convolutional Neural Network-Long
comprehensive care for the elderly population. Our proposed Short-Term Memory (CNN-LSTM) and a diverse range of
work not only considers multi-dimensional input but also
datasets (in training/testing system) to accurately detect and
works on low power edge devices (e.g., Raspberry pi-based
interpret health emotional states in real-time. Concurrently,
system) while producing better precision score. By
the video-based anomaly detection module employs a
advancing AI-driven healthcare systems, our work lightweight deep neural network called MobileNet for
contributes to improving the quality of life and promoting
embedded systems to identify abnormal behaviors and
independent living among elderly individuals, addressing trigger timely interventions. As illustrated in Figure 1, these
critical needs in eldercare, and fostering societal well-being.
modules together provide a robust architecture for
continuous monitoring, personalized support, and prompt
The rest of the paper is organized as follows. Section 2
response, in a home environment for elderly individuals
outlines the related works and standards. Section 3 presents
living independently.
the architectural details and algorithm of the proposed
system, and Section 4 describes the implementation and At its core is the smart health device, serving as the central
hub for data processing and decision-making. The voice
experimentation. Results and discussions are presented in
Section 5, and Section 6 provides concluding remarks with signal captured through microphones from the user’s
premise initiates the process, providing real-time emotional
major findings of the proposed work.
cues for analysis. This input undergoes preprocessing and
chunk formation to facilitate efficient processing and
2. RELATED WORKS
analysis. Subsequently, the Acoustic Feature Extraction
Module extracts relevant features from the voice input,
The existing works [8-9] on elderly personalized healthcare
cover abnormal behaviors such as falls, tumbles, aggression, capturing nuances essential for health state recognition.
Following feature extraction, dimensionality reduction
and wandering, based on wearable sensors or the video feed
captured on consumer networked cameras and processed on techniques are applied to streamline the data, enhancing
computational efficiency without sacrificing accuracy. The
a local GPU server. In machine learning based techniques,
the attention mechanisms and advanced models enhance the processed voice data then enters the voice-based state
recognition model CNN-LSTM, which is a deep learning-
accuracy of anomaly detection, particularly in identifying
critical events like falls, thereby mitigating the risk of false based algorithm trained to interpret health states. Predicted
health states are generated, indicating the severity of detected
alarms, or overlooked health concerns. However, the
standardized approaches in system development require emotions. If the severity is low, the system continues to
monitor voice input, maintaining vigilance for any
interoperability with home networks, IP networks, and health
service provider network while meeting the critical significant changes. However, if a high-severity emotion is
detected, the system triggers the video capturing module,
requirements of healthcare.
transitioning to visual analysis for obtaining additional
context.
The ITU Focus Group on Artificial Intelligence for Health
(FG-AI4H) Topic Group on Falls Amongst the Elderly (TG- Upon activation, the video input undergoes preprocessing to
Falls) [6] addresses the critical issue of preventing falls enhance image quality and reduce noise, preparing it for
among elderly, a common health problem with significant analysis. The Person Localization Model identifies and
repercussions. Falls among community-dwelling adults aged tracks individuals within the video feed, facilitating targeted
65 years or older account for a substantial portion of analysis. Concurrently, the Voice-based Emotion
hospitalizations and lead to loss of independence. AI Recognition Model operates in tandem, correlating voice
techniques offer a promising solution by generating models cues with visual observations to refine emotion predictions.
that combine various data sources, enabling standardized The overall algorithm of the proposed system is described in
multifactorial assessment of fall risk factors and facilitating Algorithm 1. The YOLO-based segmentation and
their implementation in clinical practice. Recommendation localization on video feed is used to continuously monitor
ITU-T Y.4220 [7] delineates the criteria and functionalities health-related activities over extended periods, providing
essential for an Abnormal Event Detection System (AEDS) valuable insights into patient behavior, movement patterns,
implemented within smart homes, with a primary focus on and activity levels. This information aids in assessing patient
identifying health-related occurrences such as falls or strokes. well-being and identifying any deviations from normal
Central to its directives is the establishment of standards behavior. The MobileNet-based enhanced transfer learning
aimed at enhancing system efficacy, minimizing false alarms, model is used on video data to classify postures (e.g.,

– 94 –

133 134 135 136 137 138 139 140 141 142 143