Page 768 - AI for Good Innovate for Impact
P. 768

AI for Good Innovate for Impact



                      (continued)

                       Item              Details
                                         Model Development: Use either the ensemble of current Convolutional
                                         Neural Network (CNN)/Transformer models or by identifying key charac-
                                         teristics from each and creating a new model to implement a lightweight,
                                         real-time ISL recognition model that is optimised for efficiency on edge
                                         devices, guaranteeing a seamless user experience and low latency. We can
                                         use language models to help us preserve the emotion and context. Recent
                                         advancements in continuous sign language recognition, like the use of
                                         motor attention mechanisms [1] and multi-scale feature enhancement [2],
                                         suggest potential improvements in real-time detection. A recent study by
                                         Hirooka et al. [3] demonstrates the use of Stack Transformer models with
                                         spatial-temporal attention for dynamic multi-culture sign language recog-
                                         nition, which can offer significant improvements in dynamic ISL detection
                                         by preserving temporal context and motion features.
                                         Tech Stack: TensorFlow, PyTorch, OpenCV, CNNs (ResNet, VGG),
                                         Transformer  Models  (Bidirectional  Encoder  Representations  from
                                         Transformers(BERT)), Generative Pre-trained Transformer(GPT)), Keras,
                                         MobileNet for real-time inference on edge devices
                                         Intermediate Representation: Extract intermediate representations in latent
                                         and embedding space, such as:
                                         –  Skeletal keypoint sequences for tracking of user.
                                         –  Encoding for mapping signs to linguistic structures.
                                         –  Temporal feature representation to capture the timing and transitions
                                            between signs using detection of hand movements.
                                         –  Facial feature embedding to track and preserve emotion and expres-
                                            sion (with de-personification to ensure privacy).
                                         Tech Stack: OpenPose, Dlib (for facial feature extraction), TensorFlow/Keras
                                         (for embedding), PyTorch (for dynamic embeddings)
                                         Privacy-Preserving Mechanisms: Ensure anonymization of data using pose
                                         estimation for skeletal key points and avoid using facial or personal iden-
                                         tifiers. Encrypt data during transmission to prevent unauthorized access.
                                         Tech Stack: Cryptography Libraries (PyCryptodome, Open Secure Sockets
                                         Layer(SSL)), Secure Data Transmission (Hyper Text Transfer Protocol Secure
                                         (HTTPS), Transport Layer Security (TLS)), General Data Protection Regula-
                                         tion(GDPR)-compliant Data Storage Systems
                                         Model Evaluation and Fine-Tuning: Compare performance between exist-
                                         ing Transformer and CNN models and our novel model for ISL recognition
                                         using various interpretability techniques and feature importance analysis.
                                         Tech Stack: Shapley Additive Explanations (SHAP), Local Interpretable
                                         Model-agnostic Explanations (LIME)
                       Technology        AI, Natural Language Processing(NLP), Sign Language Recognition, Pose
                       Keywords          Estimation, Real-time Translation, Embedding Space

                       Data Availability  Public
                       Metadata (Type of  Video, Text (ISL videos annotated with linguistic mapping)
                       Data)
                       Model Training and  CNN, Transformer-based NLP, Pose Estimation, Adversarial Augmentation
                       Fine-Tuning
                       Testbeds or Pilot  P2SLR:Privacy-Preserving Sign Language Recognition[4]
                       Deployments




                  732
   763   764   765   766   767   768   769   770   771   772   773