Page 212 - Kaleidoscope Academic Conference Proceedings 2020
P. 212

2020 ITU Kaleidoscope Academic Conference




           The displacement of the body is computed by dividing the   3.3   Skeleton Generation and Processing
           displacement of the neck by height H. The normalized joint
           positions  are  obtained  by  normalization  using  the  body   The  human  skeleton  is  generated  by  using  a  stacked
           height H. The displacement of the joints is computed using   hourglass  network.  This  neural  network  is  used  to  detect
           the normalized coordinate and it has 13 joints multiplied by   human skeleton (joint positions) from each video frame, and
           2 displacements per joint to give a 104 dimension for N=5   then the skeleton is utilized as raw data to extract features
           frames.                                            and  make  classification  by  using  machine-learning
                                                              algorithms [12].
           3.2    Feature Vector Optimization
                                                              For each frame in the input video stream, the human subject
           There are in total 10874 extracted feature samples with 314   is detected and the skeleton for frame is generated. Hence for
           dimensions. These features are mainly from normalized joint   a video stream, continuous sequence of skeleton information
           points  such  as  Xs,  height,  displacement  of  the  body,   is generated. The skeleton vectors are processed by various
           displacement of the joints, velocity of body, and joints. The   techniques such as transformations and concatenation of a
           Fisher vector and dimensionality reduction are applied for   window of frame vectors as detailed in Algorithm 1.
           the  optimization  of  the  features,  and  reduce  it  into  50
           dimensions before classification. This is shown in Figure 3.   The  classification  of  the  action  performed  by  the  human
                                                              subject is carried out on the skeleton information generated
            X: Normalized joint                               by the HG model. The skeleton vectors are concatenated and
                position     10874 Samples                    features are  extracted  to generate  the  feature  vector  using
                             314 Dimensions   50 Dimensions   Algorithm 2. Before generating the feature vector, the raw

                V_body                PCA          Classifier   skeleton data is preprocessed. After vector generation, the
                                                              dimensions are reduced to make the data more focused and
                                                              to improve the performance of the classifier.
                V_Joints
                                                               Algorithm 1: Skeleton_Generation:
            Features of prev. 5 frames
                                                               Load the images
                                                               images_loader =
                    Figure 3 - Feature vector optimization     ReadValidImagesAndActionTypesByTxt()
                                                               num_total_images = images_loader.num_images
           3.2.1   Fisher Vector                               for ith_img in range(num_total_images):
                                                                  img, str_action_label, img_info =
           The  Fisher  vector  (FV)  encodes  the  gradients  of  the  log-  images_loader.read_image()
           likelihood of the features under the Gaussian-Mixture-Model   humans = detect_human(img)
           (GMM),  with  respect  to  the  GMM  parameters  [10].  The   img_disp = img.copy()
           reason  behind  using  Fisher  vector  is:  it  is  efficient  to   draw_skeleton(img_disp, humans)
           compute, and provides excellent results even with efficient   img_displayer.display(img_disp, wait_key_ms=1)
           linear  classifiers,  where  it  is  often  compressed  with  a   skeletons, scale_h = humans_to_skels_list(humans)
           minimal loss of accuracy using product quantization. The FV   Save skeleton data for training
           encoding  stores  the  mean  and  the  covariance  deviation   skeletons, scale_h = humans_to_skels_list(humans)
           vectors per component k of the GMM and each element of   dict_id_to_skeleton =track(skeletons)
           the local feature descriptors together.                skels_to_save = [img_info + skeleton]
                                                               Read all the skeleton files of the training data into
           3.2.2   PCA Dimensionality Reduction                skeletons
                                                               Initialize all_skeletons as empty structure
           The input to the PCA [11] algorithm is the final concatenated   for i in range(num_skeletons):
           feature vector from the previous step and the output is the   skeletons = read_skeletons_from_i _txt(i)
                                                                                              th
           vector with reduced dimensionality. During the evaluations,   If empty, discard this image
           a range of values for parameters of the PCA is examined and   skeleton = skeletons[idx_person]
           the best value of the parameters is stored for testing. It is used   label = skeleton[idx_action_label]
           to preserve the essential parts that have more variation of the   all_skeletons.append(skeleton)
           data and remove the non-essential parts with fewer variations.   Create a dump file of the skeleton
           The  resultant  feature  vector  has  a  dimension  of  50.  The   simplejson.dump(all_skeletons, f)
           number  of  samples  are  10874,  which  is  smaller  than  the
           original 11202 frames due to the setting of sliding window
           and missing data. These features are ready for classification.











                                                          – 154 –
   207   208   209   210   211   212   213   214   215   216   217