Page 210 - Kaleidoscope Academic Conference Proceedings 2020
P. 210

2020 ITU Kaleidoscope Academic Conference




           The methods based on dense trajectories [6] employing the   and  Section  3  describes  the  theory  behind  system
           Gaussian  mixture  model  (GMM)  for  codebook generation   development.  The  implementation  detail  for  performance
           and  Fisher  vector  encoding  for  action  recognition  have   evaluation and experimental results are presented in Section
           shown better performance. Although the motion trajectory   4, while Section 5 concludes the paper.
           describing delicate motion represents both the dynamics and
           appearance of an action in a scene, because of a low-level     2.  PROPOSED SYSTEM
           descriptor  will  not  be  enough  for  action  recognition.  This
           happens due to the absence of action semantics at the global   The  outline  of  the  proposed  action  recognition  system  is
           level.                                             shown in Figure 1. In this system model, different frames of
                                                              the video stream are used to generate the skeleton of a human
           The  proposed  system  utilizes  deep  machine-learning   subject.  The estimated skeleton is transformed using hip and
           techniques  to  improve  its  performance  accuracy  in  action   theta transformation to remove the occlusion effect on the
           recognition  (e.g.,  wave,  punch,  kick,  jump,  etc.)  over   frames  due to difference in  viewpoints  and camera angle.
           existing approaches. The major contribution of this work is   From  the  skeleton  sequence  the  features  of  the  joints  are
           twofold:  a  skeleton  generator  that  generates  the  skeleton   extracted,  encoded  using  Fisher  vector  and  reduced  using
           joint points for the human object, and an action detector that   PCA. This optimal feature code sequence is used to train the
           considers the sequence of a feature vector. Our models are   classifier model which is further used to identify the action
           designed  to  leverage  deep-learning  techniques  while   classes.  Skeleton  tracking  is  used  to  obtain  the  estimated
           complying with the criteria set by Recommendation ITU-T   skeleton of the human subject. This is performed by the pose
           H.626.5. The system models have been developed to meet   estimation  method  which  provides  a  set  of  points  which
           the requirements listed in ITU-T H.626.5 – “Architecture for   represents  the  joint  coordinates  of  the  human  skeleton.  It
           intelligent visual surveillance systems” [7] and ITU-T F.743   consists of a two-dimensional (2D) pose estimation module
           “Requirements and service description for video surveillance”  and a depth regression module, which predicts the 2D joint
           [8]. In our system, the target recognition and association are   locations and the depth values, and it is implemented using
           achieved  with  the  combination  of  DNN  and  HGN  to   hourglass network architecture. The network output is a set
           recognize the action performed by the human.       of  low-resolution  heat  maps.  Each  map  represents  a  2D
                                                              probability distribution of one joint.
           The remainder of the paper is organized as follows. Section
           2 provides the architectural details of the proposed system

                                           Pose Estimation                              Preprocessing

                                                Joint                                       Hip
                                              Prediction                               Transformation
                           Video                                    Estimated
                          stream                                    skeleton

                                              Heat Map                                     Theta
                                                                                       Transformation





                                                                                         Joint Vector
                    Add new
                   person label

                                                                  Fisher Vector          Body height

                                 Action      Classification                             Normalized JV
                                  Class

                                                                                          Body Disp
                   Update the
                   action of the                                 Dimensionality
                     person                                      Reduction-PCA
                                                                                          Joint Disp

                                                                                      Feature Extraction

                                    Figure 1 – The architecture of the proposed model





                                                          – 152 –
   205   206   207   208   209   210   211   212   213   214   215