Page 360 - Kaleidoscope Academic Conference Proceedings 2024
P. 360

2024 ITU Kaleidoscope Academic Conference




           2D per-key point offset field: predicts local offsets from
           each output feature map pixel to the precise sub-pixel
           location of each key point















                                                                 Figure 3– sequence of MoveNet Model Operations




                                                                         8.  DATASET PREPARATION
           Figure 2– MoveNet Architecture
                                                              We started collecting various images of different Yoga
           Although these predictions are computed in parallel, one   Poses from the Internet. After collecting a sufficient amount
           can gain insight into the model’s operation by considering   of images, we categorized data into 10 different Yoga
           the following sequence of operations:              Poses. For the sake of ease, we decided to manually split
                                                              our data into training and testing datasets. We classified the
           Step 1: The person center heat-map is used to identify the   collected images into sub-folders based on Poses. Then,
           centers of all individuals in the frame, defined as the   keeping in mind a ratio of 3:7 between testing and training
           arithmetic mean of all key-points belonging to a person.   datasets, we split the entire data into two parts. Each part
           The location with the highest score (weighted by the   have sub-folders corresponding to each data set.
           inverse-distance from the frame center) is selected.
                                                                          9.  DATA AUGMENTATION
           Step 2: An initial set of key-points for the person is
           produced by slicing the key point regression output from the   Data augmentation is a set of techniques to artificially
           pixel corresponding to the object center.          increase the amount of data by generating new data points
                                                              from existing data. This includes making small changes to
           Step 3: Each pixel in the key point heat-map is multiplied   data or using deep learning models to generate new data
           by a weight which is inversely proportional to the distance   points. Data augmentation is useful to improve performance
           from the corresponding regressed key point. This ensures   and outcomes of machine learning models by forming new
           that we do not accept key-points from background people,   and different examples to train datasets. If the data set in a
           since they typically will not be in the proximity of regressed   machine learning model is rich and sufficient, the model
           key-points, and hence will have low resulting scores.   performs better and more accurately.

           Step 4: The final set of key point predictions are selected by
           retrieving the coordinates of the maximum heat-map values
           in each key point channel. The local 2D offset predictions
           are then added to these coordinates to give refined
           estimates. See the figure below which illustrates these four
           steps. Since this is a center-out prediction – which must
           operate over different scales




                                                                          Figure 4– Data Augmentation

                                                              In this project, we mainly focused on cropping the images
                                                              such that the landmarks necessary for pose detection are
                                                              easy to visualize and analyze. We used a separate class
                                                              known as Pre-processor for Data Augmentation and
                                                              imported MoveNet pre-trained model for predicting the
                                                              landmarks. The landmarks of the images are written to a





                                                          – 316 –
   355   356   357   358   359   360   361   362   363   364   365