Page 131 - ITU KALEIDOSCOPE, ATLANTA 2019
P. 131

ICT for Health: Networks, standards and innovation




                                                              videos. Every frame in the dataset video is resized to 640 x
                                                              480 in order to maintain uniformity in feature dimension.
                                                              After applying HoG, the training sequences for LSTM are
                                                              generated by considering three consecutive frames and their
                                                              feature  vectors.  The  output  for  this  sequence  is  majority
                                                              voting  of  the  ground  truth  values  for  each  frame.  These
                                                              sequences are passed as training input to the LSTM model
                                                              initialized with parameters as shown in Table 5.

                                                              The validation phase of the fall detector in Table 6, indicates
                                                              that the VPFD model has learnt to differentiate between fall
                                                              and non-fall sequences with high accuracy.

                                                              The accuracy comparison of various methods in Table 7
               Figure 6 – Average precision in OTB 100 data set    show that better feature extraction and effective time series
                                                              representation can improve the overall performance of the
           MOTA,  denotes  how  well  the  tracker  is  able  to  map  the   fall detector.
           person to a unique ID from the entrance till the exit of the
           object from the video. This metric is calculated with the help   Table 5 – Fall LSTM parameters
           of four parameters, namely the number of correct detections,
           number of misses, number of wrong detections and number   S. No.   Parameter            Value
           of ID switches. The correct detections denote the assignment
           of correct IDs to corresponding persons. Misses denote the   1    Learning Rate          0.001
           count  of  persons  that  the  tracker  did  not  detect.  Wrong   2   Optimizer        Adam
           detections  signify  the  action  of  the  tracker  to  make  false   3   Total epoch     6
           person detections and the ID switches denote the number of   4     Train split           80%
           times the object’s ID has been changed. The overall MOT   5         Test split           20%
           accuracy is calculated using Equation 5.               6      No. of LSTM units used      64


                     =    1 – (    +        +                ℎ ) / (       )         (5)
                                                      
                                                                       Table 6 – Validation phase of VPFD
           Where M denotes person misses, WD denotes wrong person
           detections, ID switch represents ID switches and Obj gt denotes   Epoch   Loss        Accuracy %
           total persons in the entire video.                     1             0.2937              87.42
                                                                  2             0.1401              93.45
           The  accuracy  for  the  MOT  dataset  along  with  the  four
           mentioned parameters  is listed in  Table 4. This table also   3     0.1051              96.52
           shows  the  performance  comparison  of  three  different  sub   4   0.0874              97.68
           components.  The  results  show  that  the  combination  of   5      0.1211              95.20
           appearance and motion similarity yields higher accuracy.   6         0.0553              98.01

                         Table 4 – MOTA results                  Table 7 – Comparison of methods based on accuracy

            Method   Correct   Miss   Wrong   ID    MOTA        S. No.         Method            Accuracy %
                     Detects         Detects   switch             1       Curvelets + HMM [7]       96.88

            CNN  +  78.23%   12.2%   3.3%    7.5%   76.6%         2      Optical Flow + CNN [8]     95.00
            LSTM                                                  3     HoG + LSTM (Proposed)       98.01

            CNN      77.1%   15.4%   7.01%   7.5%   70.1%     Although  fall  detection  methods  based  on  curvelets  and
                                                              HMM  [7]  produce  higher  accuracy  than  the  optical  flow
            LSTM     78.96%   14%    8.1%    7.1%   70.8%     technique with CNN [8], the proposed technique employing
                                                              HoG  features  in  LSTM  achieves  significantly  higher
           4.4    VPFD Results                                accuracy due to an enhanced learning technique.

           The UR Fall dataset has been utilized for the training and         5.  CONCLUSION
           validation  phase  of  the  VPFD  model.  The  fall  dataset
           consists  of  30  Fall  event  videos  and  40  normal  videos   The proposed system  is based on the combination of two
           containing  daily  life  activities.  The  ground  truth  specifies   models:  MFPT  and  VPFD  to  monitor  an  elderly  person’s
           whether  fall  has  occurred  in each  and  every  frame  of  the   health related activities and report any falls detected through




                                                          – 111 –
   126   127   128   129   130   131   132   133   134   135   136