Page 141 - Kaleidoscope Academic Conference Proceedings 2024
P. 141
Innovation and Digital Transformation for a Sustainable World
performance on a validation dataset ceases to improve, swift processing of video frames in real-time, maintaining
thereby preventing unnecessary computation and ensuring high performance standards. Trained to categorize postures
efficient convergence to the optimal solution. This approach into three distinct states namely, standing, sitting, or fallen
enables the system to strike a balance between maximizing down as shown in Figure 3. The model assumes a critical role
training accuracy and generalizing well to unseen data. By in monitoring the individual's condition and swiftly detecting
monitoring performance metrics across epochs and halting potential emergencies, particularly instances of falls.
training when further improvements are deemed unlikely,
early stopping enhances the system's robustness and The accurate classification of posture serves as a
efficiency in emotion recognition tasks. fundamental indicator of the individual's well-being and
safety. By adeptly discerning between various postures in
During development, the system captures audio input from a real-time, the system enables proactive intervention in
connected microphone and segments it into short frames for critical situations such as falls. This proactive approach is
analysis. By leveraging lightweight model architecture and particularly valuable for elderly individuals who may be at
feature extraction techniques, the system efficiently increased risk of accidents. Moreover, the MobileNet-based
processes these audio frames, extracting pertinent features enhanced transfer learning model's adaptability and
and making emotion predictions with minimal latency. The generalization capabilities helps to deal with diverse
integration with the Raspberry Pi's hardware ensures smooth environments and individuals. By leveraging pre-existing
operation, effectively meeting stringent latency knowledge from extensive datasets, the model effectively
requirements. The hardware experimental setup for the code recognizes subtle variations in posture and movement
development is shown in Figure 2. patterns, even under challenging circumstances. The
incorporation of MobileNet-based posture detection
enhances the system's responsiveness and effectiveness in
multi-modal environment.
In pursuit of refining the model's efficacy in identifying falls,
advanced concepts of attention and hourglass layers are
seamlessly integrated. Attention mechanisms assume a
pivotal role by dynamically assigning weights to different
features within the input sequence. This dynamic allocation
enables the model to prioritize pertinent cues indicative of
fall events, thereby bolstering the accuracy of fall detection.
By amplifying the importance of features associated with
falling, such as sudden alterations in acceleration patterns or
specific movement traits, the attention model effectively
Figure 2 – Experimental setup for system development distinguishes falls from other activities or gestures,
mitigating false alarms and facilitating prompt interventions.
4.2 Video-based fall detection Furthermore, the hourglass model, originally devised for
human pose estimation tasks, contributes significantly to
The video module of the Elderly Wellness Companion accurately pinpointing the person's body joints or key points
system plays a critical role in ensuring timely intervention within the video frames. Leveraging an encoder-decoder
and support for elderly individuals, particularly in scenarios architecture, the hourglass model adeptly captures multi-
where anomalous emotions are detected through the voice scale features while preserving spatial information, enabling
module. Upon detection of such anomalies, the video module precise estimation of the person's posture. Through
is triggered, activating the networked camera to capture systematic down sampling and subsequent up sampling of
video footage for further analysis. It's important to note that the input data, spatial context is meticulously maintained
the camera is only enabled when necessary, ensuring privacy throughout the encoding process. This fidelity to spatial
and minimizing intrusion into the user's space. details is further reinforced by skip connections between
encoder stages, ensuring the propagation of fine-grained
Once the video footage is captured, it is processed to extract information crucial for posture localization.
relevant information about the person's posture and activity.
The first step involves segmenting the video frames to The symbiotic integration of attention and hourglass models
identify the person's location using YOLO [14] based empowers our system with exceptional precision in
segmentation. This segmentation technique efficiently identifying falls and monitoring the individual's posture. By
isolates the individual from the background, facilitating meticulously scrutinizing the temporal and spatial attributes
accurate analysis of their movements and posture. of video frames, the model swiftly detects anomalous events
and triggers appropriate responses, such as alerting
Upon segmenting the individual in the video frames, the caregivers or initiating emergency protocols. This proactive
system integrates a MobileNet-based enhanced transfer stance not only enhances the safety and well-being of elderly
learning model to analyze the person's posture. Renowned individuals but also instills confidence and assurance among
for its lightweight and efficient design, MobileNet ensures users and their families. When visual data (Figure 4) is
– 97 –