Page 130 - ITU KALEIDOSCOPE, ATLANTA 2019
P. 130

2019 ITU Kaleidoscope Academic Conference




                                                              actual  outputs.  The  model  is  then  trained  with  this  data
                                                              sequence using the parameters shown in Table 3.

                                                              The  training  phase  results  in  Figure  5  show  the  model
                                                              convergence with respect to the input data after 350 epochs.
                                                                                                th
                                                              The slight increase of loss during the 150  epoch indicates
                                                              that the model slightly falls in the local minimum rather than
                                                              attaining global minimum. The use of Adam optimizer helps
                 Figure 4 – System applied on UR Fall dataset   to regularize the parameters during this stage and allows to
                                                              further decrease the overall loss value of the network.
           4.1    Siamese CNN Implementation
                                                                        Table 3 –Motion LSTM parameters
           The  Siamese  CNN  has  been  trained  on  a  custom  dataset
           generated from OTB and MOT datasets. This custom dataset   S. No.   Parameter            Value
           is created by extracting the images of all the persons present   1   Learning Rate       0.001
           in all the videos of the dataset using ground truth information.   2   Optimizer         Adam
           Similar and non-similar pairs of images are generated from   3     Total epoch           350
           the custom dataset by pairing images of the same object and   4    Train split           80%
           pairing images of different objects respectively. The training
           parameters  of  this  network  are  shown  in  Table  1.  The   5   Test split           20%
           network  is  trained  such  that similar  pair  inputs produce  a   6   No. of LSTM units used   25
           score closer to 1 and dissimilar pair inputs produce a score   7   Euclidean distance threshold   10
           closer to 0.                                                   for motion similarity

                    Table 1 – Siamese CNN parameters                350
             S. No.        Parameter            Value               300
               1          Learning Rate          0.001              250
               2           Optimizer             Adam               200
               3           Total epoch            5               Loss
               4         Train image split       70%                150
               5         Test image split        30%                100
               6     No. of convolutional layers   9                50
                             used
               7     No. of Pooling layers used   4                  0
               8      No. of Dense layers used    2                    0       100      200      300     400
               9    Threshold for image similarity   0.5                               Epoch

           The trained model is then subjected to network pruning in
           order  to  increase  the  processing  speed  of  the  model.  The   Figure 5 – Validation phase of LSTM
           overall training and validation phase of the model, shown in
           Table 2, indicates the maximum accuracy attained after 5   4.3   MFPT Results
           epochs.
                                                              The  performance  of  MFPT  has  been  analyzed  using  the
                 Table 2 – Validation phase of Siamese CNN    following two metrics: precision and multiple object tracking
                                                              accuracy  (MOTA).  Precision  is  the  measure  of  detecting
              Epoch         Loss           Accuracy (%)       objects with appropriate bounding boxes. This is calculated
                1          0.3879             82.22           by  finding  the  Manhattan  distance  between  the  predicted
                2          0.2716             86.39           bounding box center and actual bounding box center. If the
                3          0.2014             87.82           distance is less than the threshold then it indicates correct
                                                              object detection. The average precision of the tracker for all
                4          0.1862             91.78           the person videos in the OTB 100 dataset is shown in Figure
                5          0.1713             92.07           6. The percentage of average precision at threshold value of
                                                              20 is 94.67%.
           4.2    LSTM Implementation

           A custom dataset containing the center coordinates of objects
           from OTB and MOT datasets has been utilized for the overall
           training  of  the  motion-LSTM  model.  From  the  dataset,
           random sequences of length 12 are extracted as inputs to the
           model and centers of each sequence are considered as the




                                                          – 110 –
   125   126   127   128   129   130   131   132   133   134   135