Page 215 - Kaleidoscope Academic Conference Proceedings 2020
P. 215

Industry-driven digital transformation




           The  action  classification  accuracy  denotes  how  well  the   generate  the  vector.  When  these  concatenated  frames  are
           classifier is able to map the action performed by a subject to   used, it improves the accuracy of both the SVM and DNN
           the action label from the entrance till the complete exit of the   model as shown in Figure 8.
           subject  from  the  video.  The  Accuracy  is  the  fraction  of
           predictions correctly predicted by the model.       100


                                                                                                                                 
                                            =        (6)        95
                                                                                                                          
           It is calculated based on the true and false predictions of the   90
           action classes by the trained classifier.
                                                                85

                                                                80
                                 +        
                                            =        (7)
                              +        +        +               75
           where TP =True Positives, TN =True Negatives, FP = False    Feature Vector  5 frame concatenated
                                                                                          Feature Vector
           Positives, and FN = False Negatives.
                                                                                 SVM   DNN
           The  system  performance  using  the  SVM  and  DNN
           classification model in terms of accuracy is shown in Figure
           7 and Table 3.                                       Figure 8 - Accuracy of different types of Feature vector
                                                                                    used
            96
            95                                                               5.  CONCLUSION
            94                                                The  proposed  system  is  based  on  the combination of two
            93                                                models HGN and DNN to capture the action performed by
            92                                                the  human  subject  and  to  recognize  the  action.  The
            91                                                performance of the system was evaluated on two different
            90                                                data sets of MSR Action and NTU RGB. The HGN-DNN
            89                                                model’s precision and accuracy signify the fact that multiple
                    Traning Accuracy      Test Accuracy       feature-based  models  help  in  achieving  higher  efficiency.
                                                              The  proposed  system  achieved  an  accuracy  of  95.6%  in
                     Skeleton Feature +SVM  HGN + DNN         action  recognition.  The  usage  of  a  processed  and
                                                              concatenated skeleton data model has helped in representing
                                                              time-series data effectively and hence achieving the higher
            Figure 7 - Training and test accuracy of SVM and DNN   accuracy  of  the  system.  The  proposed  system  for  action
                                                              recognition meets the requirements of service description for
           The SVM classifier has a training and test accuracy of 92.4%   video  surveillance  specified  in  Recommendation  ITU-T
           and 91.2% respectively. The DNN model has a training and   F.743.  It  can  be  standardized  as  an  extension  of  the
           test accuracy of 95.6% and 93.8% respectively. The training   intelligent visual surveillance system architecture specified
           accuracy  is  obtained  by  running  the  model  against  the   in Recommendation ITU-T H.626.5.
           training  data  set  while  the  test  accuracy  is  obtained  by
           predicting based on the test data set.                              REFERENCES

              Table 3 - Comparison of methods based on accuracy   [1]   D.  Li,  T.  Yao,  and  L.  Y.  Duan,  “Unified  Spatio-
             S. No.          Method            Accuracy %           Temporal    Attention    Networks    fo   Acti
                                                                    Recognition in Videos” IEEE Trans. on Multimedia,
                 1     Skeleton Feature + SVM     92.4
                                                                    Vol. 21, No. 2, Feb. 2019.
                 2          HGN+DNN               95.6
                                                              [2]   O.  Oshin  and  A.  Gilbert,  J.  Illingworth,  and  R.
                                                                    Bowden,  “Action  Recognition  using  Randomised
           The classification model is trained on two types of processed   Ferns” IEEE Conference on Computer Vision (ICCV
           skeleton data. In the first type, the data from each frame of   Workshop), Nov. 2009.
           the  video  is  processed  separately and the  skeleton  data is
           used to extract and generate the feature vector on which the   [3]   W. Xu, Z. Miao, and X. Ping “Hierarchical Spatio-
           classifiers are trained. In the next type, five frames are taken   Temporal Model for Human Activity Recognition”,
           as  a  sliding  window  and  the  skeleton  data  obtained  from   IEEE  Trans.  on  Multimedia,  Vol.  10,  No.  7,  Feb.
           these are concatenated and used to extract the features and   2017





                                                          – 157 –
   210   211   212   213   214   215   216   217   218   219   220