Page 34 - First special issue on The impact of Artificial Intelligence on communication networks and services
P. 34

,78 -2851$/  ,&7 'LVFRYHULHV  9RO        0DUFK




































                             Figure 6. The processing flow of typical CNN-based detection methods.


         R-CNN model to train the classifier and BB localizers. Un-
                                                               Table 1. Top-ranked detection algorithms on KITTI.
         like the former algorithms which could only get satisfying
                                                                                 Target object (Moderate level)
         mean Average Precision (mAP) performance with the weak-  Algorithm
         ness of slow speed, Faster R-CNN can achieve real-time pro-              Car    Pedestrian  Cyclist
                                                                MS-CNN [46]     89.02%    73.70%   75.46%
         cessing since it benefits from RPN and can get a 5fps speed
                                                                SubCNN [47]     89.04%    71.33%   71.06%
         with one NVIDIA K40 GPU. Redmon et al. designed YOLO
                                                                SDP+RPN [48]    88.85%    70.16%   73.74%
         [43] which directly took the whole input images to train the
                                                                  3DOP [49]     88.64%    67.47%   68.94%
         model, and classifies each pixel in the output feature maps.
         This equals to dividing the input image into several cells and  Mono3D [50]  88.66%  66.68%  66.36%
         doing the classification inside each cell, which avoids the ex-  SDP+CRC [48]  83.53%  64.19%  61.31%
         pensive process for proposals and can be around seven times  Faster R-CNN [42]  81.84%  65.90%  63.35%
         faster than Faster R-CNN to realize a more feasible real-time
         detection with acceptable accuracy drop.
                                                            Table 2. Comparison of CONV layers in classic CNN mod-
         These detection algorithms have shown outstanding perfor-  els.
         mance on a PASCAL VOC dataset [44]. However, for the             AlexNet  VGG-16   Inception v1  ResNet-50
         autonomous vision scene, the detection mission would be  Model    [29]      [35]       [31]       [36]
         much tougher since the objects will be presented in much  Top-5 Error  19.8%  8.8%    10.7%       7.0%
         worse quality for the big variance of object scale and the in-  # of Weights  2.3M  14.7M  6.0M  23.5M
         complete captured object shape. Therefore, we need to opti-  #ofMACs  666M  15.3G     1.43G       3.86G
         mize the way we obtain proposals during our detection algo-
         rithms. The corresponding representative benchmark for au-
         tonomous vision is KITTI [45], and various algorithms have  computing speed could reach 10-100 TOPS. To build such a
         been proposed for the dataset. We have selected some top  powerful processor with programmability and a power con-
         ranked detection algorithms and have listed them in Table. 1.  sumption of less than 30 W is a challenging task, and we will
         Actually, most of these algorithms have taken CONV layers  discuss the contenders in the next section.
         to extract the features based on the classic CNN models with
         small revisions followed by application dependent FC lay-
         ers. We compare the CONV layers of classic CNN models  3. PROCESSORS FOR REAL-TIME AUTONOMOUS
         in Table. 2. As we can see, giga MACs need to be solved  VISION
         for each input frame. Together with FC layers and consider-
         ing the number of region proposals, in order to realize real-  3.1. Heterogeneous platforms for CNN acceleration
         time processing, the hardware needs to provide a throughput
         speed of over 100-1000 GOPS. With the growing number  As the CNN algorithm rapidly develops, so have the related
         of image data collected from cameras, future requirement of  hardware accelerator designs, in recent years. The work of




                                           ‹ ,QWHUQDWLRQDO 7HOHFRPPXQLFDWLRQ 8QLRQ
   29   30   31   32   33   34   35   36   37   38   39