Page 768 - AI for Good Innovate for Impact
P. 768
AI for Good Innovate for Impact
(continued)
Item Details
Model Development: Use either the ensemble of current Convolutional
Neural Network (CNN)/Transformer models or by identifying key charac-
teristics from each and creating a new model to implement a lightweight,
real-time ISL recognition model that is optimised for efficiency on edge
devices, guaranteeing a seamless user experience and low latency. We can
use language models to help us preserve the emotion and context. Recent
advancements in continuous sign language recognition, like the use of
motor attention mechanisms [1] and multi-scale feature enhancement [2],
suggest potential improvements in real-time detection. A recent study by
Hirooka et al. [3] demonstrates the use of Stack Transformer models with
spatial-temporal attention for dynamic multi-culture sign language recog-
nition, which can offer significant improvements in dynamic ISL detection
by preserving temporal context and motion features.
Tech Stack: TensorFlow, PyTorch, OpenCV, CNNs (ResNet, VGG),
Transformer Models (Bidirectional Encoder Representations from
Transformers(BERT)), Generative Pre-trained Transformer(GPT)), Keras,
MobileNet for real-time inference on edge devices
Intermediate Representation: Extract intermediate representations in latent
and embedding space, such as:
– Skeletal keypoint sequences for tracking of user.
– Encoding for mapping signs to linguistic structures.
– Temporal feature representation to capture the timing and transitions
between signs using detection of hand movements.
– Facial feature embedding to track and preserve emotion and expres-
sion (with de-personification to ensure privacy).
Tech Stack: OpenPose, Dlib (for facial feature extraction), TensorFlow/Keras
(for embedding), PyTorch (for dynamic embeddings)
Privacy-Preserving Mechanisms: Ensure anonymization of data using pose
estimation for skeletal key points and avoid using facial or personal iden-
tifiers. Encrypt data during transmission to prevent unauthorized access.
Tech Stack: Cryptography Libraries (PyCryptodome, Open Secure Sockets
Layer(SSL)), Secure Data Transmission (Hyper Text Transfer Protocol Secure
(HTTPS), Transport Layer Security (TLS)), General Data Protection Regula-
tion(GDPR)-compliant Data Storage Systems
Model Evaluation and Fine-Tuning: Compare performance between exist-
ing Transformer and CNN models and our novel model for ISL recognition
using various interpretability techniques and feature importance analysis.
Tech Stack: Shapley Additive Explanations (SHAP), Local Interpretable
Model-agnostic Explanations (LIME)
Technology AI, Natural Language Processing(NLP), Sign Language Recognition, Pose
Keywords Estimation, Real-time Translation, Embedding Space
Data Availability Public
Metadata (Type of Video, Text (ISL videos annotated with linguistic mapping)
Data)
Model Training and CNN, Transformer-based NLP, Pose Estimation, Adversarial Augmentation
Fine-Tuning
Testbeds or Pilot P2SLR:Privacy-Preserving Sign Language Recognition[4]
Deployments
732

