Page 360 - Kaleidoscope Academic Conference Proceedings 2024

P. 360

2024 ITU Kaleidoscope Academic Conference

2D per-key point offset field: predicts local offsets from
each output feature map pixel to the precise sub-pixel
location of each key point

Figure 3– sequence of MoveNet Model Operations

8. DATASET PREPARATION
Figure 2– MoveNet Architecture
We started collecting various images of different Yoga
Although these predictions are computed in parallel, one Poses from the Internet. After collecting a sufficient amount
can gain insight into the model’s operation by considering of images, we categorized data into 10 different Yoga
the following sequence of operations: Poses. For the sake of ease, we decided to manually split
our data into training and testing datasets. We classified the
Step 1: The person center heat-map is used to identify the collected images into sub-folders based on Poses. Then,
centers of all individuals in the frame, defined as the keeping in mind a ratio of 3:7 between testing and training
arithmetic mean of all key-points belonging to a person. datasets, we split the entire data into two parts. Each part
The location with the highest score (weighted by the have sub-folders corresponding to each data set.
inverse-distance from the frame center) is selected.
9. DATA AUGMENTATION
Step 2: An initial set of key-points for the person is
produced by slicing the key point regression output from the Data augmentation is a set of techniques to artificially
pixel corresponding to the object center. increase the amount of data by generating new data points
from existing data. This includes making small changes to
Step 3: Each pixel in the key point heat-map is multiplied data or using deep learning models to generate new data
by a weight which is inversely proportional to the distance points. Data augmentation is useful to improve performance
from the corresponding regressed key point. This ensures and outcomes of machine learning models by forming new
that we do not accept key-points from background people, and different examples to train datasets. If the data set in a
since they typically will not be in the proximity of regressed machine learning model is rich and sufficient, the model
key-points, and hence will have low resulting scores. performs better and more accurately.

Step 4: The final set of key point predictions are selected by
retrieving the coordinates of the maximum heat-map values
in each key point channel. The local 2D offset predictions
are then added to these coordinates to give refined
estimates. See the figure below which illustrates these four
steps. Since this is a center-out prediction – which must
operate over different scales

Figure 4– Data Augmentation

In this project, we mainly focused on cropping the images
such that the landmarks necessary for pose detection are
easy to visualize and analyze. We used a separate class
known as Pre-processor for Data Augmentation and
imported MoveNet pre-trained model for predicting the
landmarks. The landmarks of the images are written to a

– 316 –

355 356 357 358 359 360 361 362 363 364 365