Page 139 - Kaleidoscope Academic Conference Proceedings 2024
P. 139
Innovation and Digital Transformation for a Sustainable World
Acoustic Feature Emotion / Health
Extraction State Recognition
Smart MFCC CNN - LSTM
Health Voice Preprocessing ZCR Deep Neural
Device Input & Chunks Feature Network
Formation RMS
Optimisation
Audio / Video Predicted
Triggering Severity Health State
Module Estimation
Posture
Video Recognition
Input Pre - Person Predicted
Processing Localization Attention Posture
based
Enhanced
Transfer
Learning
Model
Alert Linked Update Report
Healthcare Provider in Database Fall Detection
Figure 1 - Proposed system architecture of voice and video-based health anomaly detection
2. Load pre-trained models for voice emotion
Standing, Sitting, or Fallen) and further analysis. Predicted
recognition and posture classification.
postures are evaluated for signs of potential health issues,
enabling timely anomaly detection and intervention. The 3. Preprocess voice input, segmenting it into short
frames.
developed algorithm is tailored to run on an edge device
connected to an IP network system. A Raspberry Pi-based 4. Extract acoustic features from audio.
5. Utilize CNN – LSTM Deep Neural Network to
single-board computer is used to implement and test the
proposed system architecture with associated machine detect emotional states.
6. Check for severity in detected emotional states.
learning algorithms.
7. Activate video module upon detection of severe
anomalous emotional states.
Algorithm 1 – Elderly Health Anomaly Detection
8. Segment video frames using YOLO segmentation
algorithm
to isolate person from background.
9. Employ MobileNet-based enhanced transfer
Input: Real-time voice input, Video feed
learning model to classify postures (Standing,
Output: Detection and Reporting of Elderly Health
Sitting, or Fallen).
Anomalies
1. Initialize microphone and camera sensors for real- 10. Ascertain for high-severity emotions (e.g., a fall),
and trigger alerts.
time voice and video input.
– 95 –