Page 361 - Kaleidoscope Academic Conference Proceedings 2024
P. 361

Innovation and Digital Transformation for a Sustainable World




           CSV file for each individual image. Each CSV file is then      Figure 6– Model Description
           combined into a single file, for each folder (pose) given. A
           function named ‘class_names’ is used to return the names of   11. CONFUSION MATRIX AND RESULT
           the classes (poses). A final function named                           VISUALIZATION
           ‘all_landmarks_as_dataframe’ is used to combine all CSV
           files corresponding to poses, into a single data frame. The   After training and compilation, this model gives a validation
           training data set CSV is split into Validation and Training   accuracy of 0.87. The same model, when fit on the Testing
           datasets, with the ratio of 3:17, to train the model.   data set, gives an accuracy of 0.86. We also calculated and
                                                              plotted a Confusion Matrix to visualize the correct and
                                                              incorrect predictions in the Testing Dataset.














                    Figure 5– Pre-process code snippet
                       10. MODEL DESCRIPTION

           After data augmentation in CSV format, need to convert the
           data into tensors for the model. We first computed the
           various centre points relevant to the pose, using a function       Figure 7– Accuracy
           ‘get_center_point’. Our next step was to get a normalized
           pose size. This is achieved by calculating distances of
           landmarks from various centre points, and then taking an
           average. Finally, the average is normalized to determine the
           pose size. The landmarks are normalized depending on the
           pose size. Since each landmark is denoted by coordinates
           and score, the Input size initially is 51 (17 * 3) for our
           model. We embed the input size, by expanding first,
           normalizing landmarks based on coordinates, and then
           flatten it to a size of 34. Due to this, our final input size for
           the model is 34 (17 * 2). We introduced 3 hidden layers to
           our model. Our first hidden layer consists of 128 neurons.
           In the second and third layer, we go on reducing the number
           of neurons by half the size of the previous layer. Thus the
           second and third layers have 64 and 32 neurons,
           respectively. At each layer, we are using ReLU as an
           activation function. Along with the layers, we have added
           Batch Normalization after each hidden layer. This is done
           with the intention of normalizing the outputs of each hidden
           layer, and tackling the problem of Vanishing Gradients.
           This approach ensures that we can avoid the case of over-
           fitting for various Yoga Pose Datasets
                                                                          Figure 6– Confusion Matrix

                                                              As an extra measure to visualize the correct and incorrect
                                                              predictions, we added separate codes to investigate the
                                                              correct and incorrect predictions. The separation is done by
                                                              comparing y_true and y_predicted labels. For the images,
                                                              where the labels don’t match, they are shown as incorrect
                                                              predictions. The remaining images are shown as correct
                                                              predictions.









                                                          – 317 –
   356   357   358   359   360   361   362   363   364   365   366