Page 133 - Kaleidoscope Academic Conference Proceedings 2021
P. 133

Connecting physical and virtual worlds




           by the network and is trained using  gradient  descent.   4.1   Skeleton generation based on pose estimation
           Algorithm-2 presents the training  method for  SAF+Bi-
           LSTM. Multiple action classes are classified by using these   The human  skeleton  is generated by using  human pose
           neural networks.                                   estimation which is trained on the MPII data set. From each
                                                              video frame, it detects a human skeleton (joint positions) and
           The  output  function used  by the Bi-LSTM classifier is a   then the extracted skeleton features are utilized as raw data
           SoftMax. The implementation is carried out with 3 hidden   to perform classification. Figure 4 shows a frame of a typical
           layers having 100 nodes in each layer. The input in the first   video captured for the skeleton generation using the open
           layer  of the network is the feature  vector. The input is   pose estimation method.
           processed in the next layer and each  node in the layer
           connects a weight to every node in the following layer. After   4.2   Activity forecasting
           the data is processed, the  network changes  the associated
           weight.                                            Vector  Auto-Regression (VAR),  which is a  multivariate
            Algorithm 2: SAF+BiLSTM_Train                     forecasting algorithm, is used to perform activity forecasting.
            for X, Y in the training dataset:                 When a video pauses for a few seconds due to delay, the
                 n_components=min(num_features_from_lda=D,    future pose and motion of the human activity is predicted as
             X.shape[1])                                      shown in Figure 5.
               lda=LDA(n_components=n_components,
            whiten=True)
                 lda.fit(X)
                 X_new = self.lda.transform(X)
                 clf.fit(X_new, Y)
            Initialize train_data with the X_new
            for each skeleton sequence in X_new:
                Append pose label to the data
                if video pauses due to delay
                  Compute the current time dependent variables

                                      ∗          1        ∗          1              1

                                      ∗           1        ∗          1              1
                   Create SAF+BiLSTM model, Initialize  the
           classifier                                             a) Transmitting video      b) Receiving video
             clf = BiLSTMClassifier (batch
            _size, timestamp, features)                                  Figure 4 – Skeleton generation
                Do the following until model converges:
                for every pose_sequence in train_data:
                  predicted_score = model (sequence_list)
                  Use mean square error function to compute loss
             in predicted_score
                     Perform gradient descent through
            backpropagation
                     Update model weights and biases
            return model

                    4.  RESULTS AND DISCUSSIONS

           The  MSR Action  Recognition Dataset [12],  MPII Human
           Pose Dataset [13] and IIT-B Corridor Dataset [14] are used
           to train and evaluate the model. The Python programming
           language is used for code  development  of the  proposed
           system including the web server to support video streaming.
           In an experimental setup, the video stream is obtained from   Figure 5 – Future pose prediction
           an IP-based CCTV camera. The OpenCV library in Python
           was  used to  capture and process the  video stream.  The   4.3   Activity classification
           SAF+Bi-LSTM is designed with a Tensorflow backend in
           Python using Keras library. The Pickle module in Python is   The SAF+Bi-LSTM model classifies the human activity
           used to serialize the trained learning model.      from a streaming video. The model classifies the actions as
                                                              abnormal when it identifies a fight, or suspicious activities
                                                              such as when a person leaves a bag unattended for a long
                                                              time. Figure  6 shows  detection  of  such activity from a
                                                              streaming CCTV video.





                                                           – 71 –
   128   129   130   131   132   133   134   135   136   137   138