Page 133 - Kaleidoscope Academic Conference Proceedings 2021

P. 133

Connecting physical and virtual worlds

by the network and is trained using gradient descent. 4.1 Skeleton generation based on pose estimation
Algorithm-2 presents the training method for SAF+Bi-
LSTM. Multiple action classes are classified by using these The human skeleton is generated by using human pose
neural networks. estimation which is trained on the MPII data set. From each
video frame, it detects a human skeleton (joint positions) and
The output function used by the Bi-LSTM classifier is a then the extracted skeleton features are utilized as raw data
SoftMax. The implementation is carried out with 3 hidden to perform classification. Figure 4 shows a frame of a typical
layers having 100 nodes in each layer. The input in the first video captured for the skeleton generation using the open
layer of the network is the feature vector. The input is pose estimation method.
processed in the next layer and each node in the layer
connects a weight to every node in the following layer. After 4.2 Activity forecasting
the data is processed, the network changes the associated
weight. Vector Auto-Regression (VAR), which is a multivariate
Algorithm 2: SAF+BiLSTM_Train forecasting algorithm, is used to perform activity forecasting.
for X, Y in the training dataset: When a video pauses for a few seconds due to delay, the
n_components=min(num_features_from_lda=D, future pose and motion of the human activity is predicted as
X.shape[1]) shown in Figure 5.
lda=LDA(n_components=n_components,
whiten=True)
lda.fit(X)
X_new = self.lda.transform(X)
clf.fit(X_new, Y)
Initialize train_data with the X_new
for each skeleton sequence in X_new:
Append pose label to the data
if video pauses due to delay
Compute the current time dependent variables

∗ 1 ∗ 1 1

∗ 1 ∗ 1 1
Create SAF+BiLSTM model, Initialize the
classifier a) Transmitting video b) Receiving video
clf = BiLSTMClassifier (batch
_size, timestamp, features) Figure 4 – Skeleton generation
Do the following until model converges:
for every pose_sequence in train_data:
predicted_score = model (sequence_list)
Use mean square error function to compute loss
in predicted_score
Perform gradient descent through
backpropagation
Update model weights and biases
return model

4. RESULTS AND DISCUSSIONS

The MSR Action Recognition Dataset [12], MPII Human
Pose Dataset [13] and IIT-B Corridor Dataset [14] are used
to train and evaluate the model. The Python programming
language is used for code development of the proposed
system including the web server to support video streaming.
In an experimental setup, the video stream is obtained from Figure 5 – Future pose prediction
an IP-based CCTV camera. The OpenCV library in Python
was used to capture and process the video stream. The 4.3 Activity classification
SAF+Bi-LSTM is designed with a Tensorflow backend in
Python using Keras library. The Pickle module in Python is The SAF+Bi-LSTM model classifies the human activity
used to serialize the trained learning model. from a streaming video. The model classifies the actions as
abnormal when it identifies a fight, or suspicious activities
such as when a person leaves a bag unattended for a long
time. Figure 6 shows detection of such activity from a
streaming CCTV video.

– 71 –

128 129 130 131 132 133 134 135 136 137 138