Page 135 - Kaleidoscope Academic Conference Proceedings 2021

P. 135

Connecting physical and virtual worlds

Computational Science and Engineering (CSE) and
PERFOMANCE ANALYSIS IEEE International Conference on Embedded and
Ubiquitous Computing (EUC), 2019, pp. 447-452.
100
80 [4] Lvcai Chen, Chunyan Yu, Li Chen, “A Multi-
60 Person Pose Estimation with LSTM for Video
Stream” IEEE International Conference on
40
Electronic Information Technology and Computer
20
Engineering, 2019.
0
Training Accuracy Testing Accuracy [5] Yan Bin Ng, Basura Fernando, “Forecasting future
action sequences with attention: a new approach to
SCNN DBiLSTM SAF + BiLSTM weakly supervised action forecasting” IEEE Trans.
on Image Processing, Vol.29, Sep. 2020.
Figure 8 – Training and test accuracies [6] Yan Fu, Tao Liu, Ou Ye, “Abnormal activity
recognition based on deep learning in crowd” IEEE
Table 2 – Accuracy comparison International Conference on Intelligent Human-
Machine Systems and Cybernetics, 2019.
S. No. Method Accuracy %
1 Sequential CNN 93.5 [7] Recommendation ITU-T H.627, “Signalling and
2 DBiLSTM 95.7 protocols for a video surveillance system”, August
2020.
3 SAF + Bi-LSTM 97.2
5. CONCLUSION [8] Hui Tang, Qing Wang, Hong Chen, “Research on
3D Human Pose Estimation Using RGBD Camera”
The hybrid system for an action recognition system is built IEEE Conference on Electronics Information and
using a combination of SAF and Bi-LSTM. The VAR is used Emergency Communication, 2019.
for activity forecasting, and feature extraction techniques
were oriented towards improving the recognition of actions [9] Jie Ou and Hong Wu, “Efficient Human Pose
happening across the spatial and temporal region of video Estimation with Depth wise Separable Convolution
sequences. The activity forecasting helped in sustaining the and Person Centroid Guided Joint Grouping”,
system functioning in streamed video while coping with arXiv:2012.03316v1, Dec. 2020.
pause/missing data. The system model was trained on
different actions using the MSR action data set. The skeleton [10] Linqin Cai, Sitong Zhou, Xun Yan, Rongdi Yuan,
data of the video sequence was used to build the feature "A Stacked BiLSTM Neural Network Based on
vector which was reduced using LDA to improve the Coattention Mechanism for Question Answering",
efficiency of the classification. The system performance was Computational Intelligence and Neuroscience, vol.
evaluated on two different data sets of the MSR Action and 2019, Article ID 9543490, 2019.
IIT-B Corridor data set. The SAF+Bi-LSTM model’s
accuracy and precision suggests that multiple feature-based [11] Seymanur Akti, Gozde Ayse Tataroglu, Hazim
models help in achieving higher accuracies. The proposed Kemal Ekenel, “Vision-based Fight Detection from
system achieved 97.2% accuracy in action recognition. It can Surveillance Cameras”, arXiv:2002.04355v1, Feb.
be standardized under Recommendation ITU-T H.627 2020.
“Signalling and protocols for a video surveillance system”.
[12] WanruXu, Zhenjiang Miao, and Xiao- Ping,
REFERENCES “Hierarchical Spatio-Temporal Model for Human
Activity Recognition”, IEEE Trans. on Multimedia,
[1] Video Surveillance Market, Feb. 2017.
https://www.marketsandmarkets.com/Market-
Reports/video-surveillance-market-645.html [13] MSR Action Data Set:
https://www.microsoft.com/en-
[2] Q. Ke, M. Fritz, and B. Schiele, “Time-conditioned us/download/details.aspx?id=52315
action anticipation in one shot,” in CVPR, 2019 pp.
9925–9934. [14] MPII Human Pose Dataset: http://human-
pose.mpi-inf.mpg.de/
[3] Y. Zhang, A. Girgensohn and Y. Tjahjadi, "Activity
Forecasting in Routine Tasks by Combining Local [15] IIT-B Corridor Dataset:
Motion Trajectories and High-Level Temporal https://drive.google.com/file/d/1HZZjINXIgWnq1
Models," 2019 IEEE International Conference on FYuVTTBsfJiWsXy1uU5/view?usp=sharing

– 73 –

130 131 132 133 134 135 136 137 138 139 140