Page 215 - Kaleidoscope Academic Conference Proceedings 2020

P. 215

Industry-driven digital transformation

The action classification accuracy denotes how well the generate the vector. When these concatenated frames are
classifier is able to map the action performed by a subject to used, it improves the accuracy of both the SVM and DNN
the action label from the entrance till the complete exit of the model as shown in Figure 8.
subject from the video. The Accuracy is the fraction of
predictions correctly predicted by the model. 100

= (6) 95

It is calculated based on the true and false predictions of the 90
action classes by the trained classifier.
85

80
+
= (7)
+ + + 75
where TP =True Positives, TN =True Negatives, FP = False Feature Vector 5 frame concatenated
Feature Vector
Positives, and FN = False Negatives.
SVM DNN
The system performance using the SVM and DNN
classification model in terms of accuracy is shown in Figure
7 and Table 3. Figure 8 - Accuracy of different types of Feature vector
used
96
95 5. CONCLUSION
94 The proposed system is based on the combination of two
93 models HGN and DNN to capture the action performed by
92 the human subject and to recognize the action. The
91 performance of the system was evaluated on two different
90 data sets of MSR Action and NTU RGB. The HGN-DNN
89 model’s precision and accuracy signify the fact that multiple
Traning Accuracy Test Accuracy feature-based models help in achieving higher efficiency.
The proposed system achieved an accuracy of 95.6% in
Skeleton Feature +SVM HGN + DNN action recognition. The usage of a processed and
concatenated skeleton data model has helped in representing
time-series data effectively and hence achieving the higher
Figure 7 - Training and test accuracy of SVM and DNN accuracy of the system. The proposed system for action
recognition meets the requirements of service description for
The SVM classifier has a training and test accuracy of 92.4% video surveillance specified in Recommendation ITU-T
and 91.2% respectively. The DNN model has a training and F.743. It can be standardized as an extension of the
test accuracy of 95.6% and 93.8% respectively. The training intelligent visual surveillance system architecture specified
accuracy is obtained by running the model against the in Recommendation ITU-T H.626.5.
training data set while the test accuracy is obtained by
predicting based on the test data set. REFERENCES

Table 3 - Comparison of methods based on accuracy [1] D. Li, T. Yao, and L. Y. Duan, “Unified Spatio-
S. No. Method Accuracy % Temporal Attention Networks fo Acti
Recognition in Videos” IEEE Trans. on Multimedia,
1 Skeleton Feature + SVM 92.4
Vol. 21, No. 2, Feb. 2019.
2 HGN+DNN 95.6
[2] O. Oshin and A. Gilbert, J. Illingworth, and R.
Bowden, “Action Recognition using Randomised
The classification model is trained on two types of processed Ferns” IEEE Conference on Computer Vision (ICCV
skeleton data. In the first type, the data from each frame of Workshop), Nov. 2009.
the video is processed separately and the skeleton data is
used to extract and generate the feature vector on which the [3] W. Xu, Z. Miao, and X. Ping “Hierarchical Spatio-
classifiers are trained. In the next type, five frames are taken Temporal Model for Human Activity Recognition”,
as a sliding window and the skeleton data obtained from IEEE Trans. on Multimedia, Vol. 10, No. 7, Feb.
these are concatenated and used to extract the features and 2017

– 157 –

210 211 212 213 214 215 216 217 218 219 220