Page 359 - Kaleidoscope Academic Conference Proceedings 2024
P. 359

Innovation and Digital Transformation for a Sustainable World




           approach for 2-D multi-individual human posture     an image/video. Both    fitness, health, and
           assessment. This could not achieve accuracy beyond 76.4%.   variants have their own set   wellness applications.
           Convolutional Neural Networks did achieve an accuracy   of parameters and   MoveNet is a bottom-up
           beyond 80%. It achieved an accuracy of 82.84%, by using   methodology. Single pose   estimation model, using
           various layers of neurons along with techniques like Batch   estimation is simpler and   heat-maps to accurately
           Normalization, and Dropouts. This accuracy did not prove   faster but required to have   localize human key-points.
           to be sufficient for effective training of the agent. But, it did   a single person in an
           become the base of many state-of-the-art learning   image/video otherwise key
           techniques. EfficientNet is the first technique to use CNN   points from multiple
           along with the Scaling method. EfficientNet scaling method   persons will likely be
           uniformly scales network width, depth, and resolution with   estimated as being part of a
           a set of fixed scaling coefficients. It also transfers well to   single subject
           other problems, and is shown to achieve accuracy on
           CIFAR-100 (91.7%) and Flowers (98.8%), among other   PoseNet again has two   This is a project of
           datasets. But, when this learning was used on Yoga Poses, it   variants in terms of model   TensorFlow, which
           showed an accuracy of 85%. Our next approach was to   architecture that is   provides two variants,
           explore Deep Residual Learning. Since deeper neural   MobileNet v1 architecture   Thunder and Lightning.
           networks are more difficult to train, residual learning   and ResNet50 architecture.  Since our project requires
           framework makes it easier to train the model. This makes   The MobileNetV1   high accuracy, we used
           the network easier to optimize, while also gaining accuracy,   architecture model is   Thunder in this project.
           because of the increased number of hidden layers. This   smaller and faster but has   For latency-critical
           approach provides an accuracy of 70% on Yoga Dataset.    lower accuracy. The   applications, Lightning is
                                                               ResNet50 variant is larger   considered to be a better
            PoseNet                 MoveNet                    and slower but it’s more   option. Using this model,
            PoseNet is  profound    Using MoveNet, the model   accurate. Both          the model is capable of
            learning structure that   detects 17 landmarks on   MobileNetV1 and        achieving an accuracy
            detects human postures by   the image of the human   ResNet50 variants support   between 0.87 and 0.90.
            distinguishing joint areas   body. After detecting the   single pose and multi-
            in a human body. But,   landmarks, the model       person pose estimation.
            even with this structure,   estimates the pose using   The model returns the
            accuracy kept fluctuating   the distance of each   coordinates of the 17 key
            between 0.5 and 0.9,    landmark from the centre   points along with a
            depending on the pose   point of the image. An     confidence score.
                                    average of each individual
                                    landmark score is taken,             Table 1– PoseNet and MoveNet
                                    and considered to be the
                                    score of that image      7.  METHODOLOGY/TECHNIQUES

            PoseNet is an older     MoveNet is the latest     MoveNet architecture consists of two components: a feature
            generation pose estimation   generation pose estimation   extractor and a set of prediction heads. The prediction
            model released in 2017. It   model released in    scheme loosely follows CenterNet, with notable changes
            is trained on a standard   2021.which is an ultra-fast   that improve both speed and accuracy. All models are
            COCO dataset and        and accurate model that   trained using the TensorFlow Object Detection API.
            provides a single pose and   detects 17 key-points of a
            multiple pose estimation   body. The model is offered   The feature extractor in MoveNet is MobileNetV2 with an
            variants.               on TF Hub with two        attached feature pyramid network (FPN), which allows for a
                                    variants, known as        high resolution (output stride 4), semantically rich feature
                                    Lightning and Thunder.    map output. There are four prediction heads attached to the
                                    Lightning is intended for   feature extractor, responsible for densely predicting a:
                                    latency-critical
                                    applications, while       Person center heat-map: predicts the geometric center of
                                    Thunder is intended for   person
                                    applications that require
                                    high accuracy.            Landmark regression field: predicts full set of key-points
                                                              for a person, used for grouping key-points into instances
            The single pose variant can  Both variants run faster   Person regression field: predicts the location of all key-
            detect only one person in   than real time (30+ FPS)   points, independent of person instances
            an image/video and the   on most modern desktops,
            multi pose variant can   laptops, and phones, which
            detect multiple persons in   proves crucial for live





                                                          – 315 –
   354   355   356   357   358   359   360   361   362   363   364