Page 383 - AI for Good Innovate for Impact
P. 383

AI for Good Innovate for Impact



               (continued)

                Item              Details
                Metadata (Type of   Visual
                Data)                                                                                               4.3 - 5G
                                  In the initialization phase a 3D body model is initialized through Parametric
                                  Human Model Creation utilizing the SMPL-X [2] parametric human model.
                                  To create the SMPL-X model from image sequences, we use the method
                                  in [11]. The initial body model is shared with the receiving end before
                                  transmission. For human motion capture we extract temporally consis-
                Model Training and   tent 3D human pose and shape from monocular video through enhanced
                Fine-Tuning
                                  spatio-temporal context by extracting body-aware deep features [3] from
                                  individual frames and simultaneously predicting initial per-frame estimates
                                  of body pose, shape, and camera pose using a standard method [5]. The
                                  final motion capture output is presented in a JSON format and transmitted
                                  for rendering at the other end.
                Testbeds or Pilot  Initial PoC and the related experience is published as referred in [1] in the
                Deployments       references section.
                Code repositories  N/A



               2      Use Case Description


               2�1     Description

               Lack of good teachers/ expert trainers is one of the hindrances against quality education
               and training for many developing and underdeveloped nations [5]. The present use case
               addresses this gap using AI native semantic communications. Although remote teaching using
               telepresence robots has been considered, it suffers from infrastructural cost and challenges [6],
               lacks humane touch and does not enable the exchange of non-verbal body cues like realistic
               communication. So, there is a shift in demand for 3D telepresence [7] with an expectation to
               bring more realism into teacher-student interaction. 3D holoportation, as described in [8], is not
               scalable, requires huge bandwidth, and prohibitively costly for any democratized usage due to
               dedicated infrastructure requirements.  So, inspired by the future looking education roadmap
               in countries like India embracing augmented reality (AR) and virtual reality (VR) glasses [9], we
               propose the following solution as described by remote teaching scenario.

               As shown in Fig. 1, the teacher stands in front of a simple RGB camera attached to a computer
               which streams live visuals and audio of the teacher to an edge computer natively integrated with
               the 5G/6G network service. The edge computer is equipped with an AI algorithm which extracts
               the teacher’s 3D body posture in real-time at a specified framerate. The extracted posture is
               encoded as semantic information of the body pose and transmitted over the network to the
               remote school’s computing device, which is connected to the VR glasses of the students. The
               computing device at the school has a local AR engine which is preloaded with a parametric
               3D avatar of the teacher. The semantic information of the live body posture received by the
               computing device in the school is decoded and transferred to the 3D avatar. As an effect, each
               student sees the remote teacher’s avatar in situ as if the teacher is present in the classroom.
               We call this ‘Semantic Live Streaming’. Since the teacher does not need to see the students in
               3D, a camera attached to the computing device in the school can transmit back the view of the
               class to the teacher through conventional real-time streaming.  The teacher does not need a




                                                                                                    347
   378   379   380   381   382   383   384   385   386   387   388