Page 368 - Kaleidoscope Academic Conference Proceedings 2024
P. 368

2024 ITU Kaleidoscope Academic Conference













           Figure 7 – Activity Diagram for Image and Text Processing

           platform.  This allows users to perform various actions,
           such as opening materials, navigating through lessons, or
           closing applications, using simple voice commands. This
           deep integration empowers users to control their learning
           environment independently, enhancing both accessibility and
           ease of use.
           Architecture and Workflow: The voice-to-action feature
           captures voice commands and processes them to determine
                                                                    Figure 9 – Validator Generator Architecture
           the intended action.  The system then executes the
           command, such as navigating to a lesson, turning a page,  of focused auditory instruction—the system introduces short,
           or opening/closing materials. This streamlined workflow  light-hearted breaks. These breaks, which might include
           ensures that users can interact with the platform intuitively  a brief, contextually relevant joke or a calming auditory
           and efficiently.                                   interlude, aim to refresh the user’s mind and prevent cognitive
                                                              overload.


                                                              This approach is complemented by breaking complex
                                                              information into smaller, manageable chunks and using
                                                              scaffolded learning techniques. By presenting information
           Figure 8 – Activity Diagram for Voice and Action Processing  step-by-step and reinforcing key concepts through repetition
                                                              and varied examples, the system supports sustained attention
                                                              and reduces the risk of attentional drift.  Through these
           3.4 AI Voice-Based Tutor - Reinforcement Learning
                                                              thoughtfully designed elements, UnSight addresses the
               with Human Feedback (RLHF)
                                                              unique attentional needs of visually impaired learners,
                                                              fostering engagement and facilitating meaningful learning
           Architecture of Implementation: For an AI Voice-Based
                                                              outcomes.
           Tutor, specifically aimed at providing an inclusive learning
           experience for visually impaired individuals, UnSight
           employs a dual-model architecture consisting of a generator
           model (GPT-4) and a validator model (Llama). The user’s
           spoken questions are first transcribed into text, which is then
           processed by the GPT-4 model to generate a response. The
           Llama model evaluates this response for factual accuracy
           and consistency, ensuring reliable and accurate information
           before it is converted back into speech and delivered to the
           user.
           Personalized Learning: RLHF models tailor the learning
           pace and content based on individual user interactions and
           feedback. By observing user responses, engagement levels,
           and types of questions asked, the model adjusts the difficulty
                                                                     Figure 10 – Novel Architecture of Unsight
           level, provides additional explanations, and recommends
           relevant learning materials. This personalization ensures that
           each user receives a customized learning experience catering
                                                                        4.  IMPLEMENTATION PLAN
           to their specific needs and progress.

           3.5  Attention Theory Considerations
                                                              The following section details the technical architecture,
           To ensure that studying remains immersive and engaging,  AI integration, and testing methodologies employed in the
           UnSight integrates adaptive pacing mechanisms designed  development of the proposed system, as well as its compliance
           to mitigate attentional fatigue and manage cognitive load.  with educational standards, data security, scalability, and
           After periods of intensive learning—such as 15-20 minutes  feedback mechanisms.




                                                          – 324 –
   363   364   365   366   367   368   369   370   371   372   373