Page 34 - First special issue on The impact of Artificial Intelligence on communication networks and services

P. 34

,78 -2851$/ ,&7 'LVFRYHULHV 9RO 0DUFK

Figure 6. The processing ﬂow of typical CNN-based detection methods.

R-CNN model to train the classiﬁer and BB localizers. Un-
Table 1. Top-ranked detection algorithms on KITTI.
like the former algorithms which could only get satisfying
Target object (Moderate level)
mean Average Precision (mAP) performance with the weak- Algorithm
ness of slow speed, Faster R-CNN can achieve real-time pro- Car Pedestrian Cyclist
MS-CNN [46] 89.02% 73.70% 75.46%
cessing since it beneﬁts from RPN and can get a 5fps speed
SubCNN [47] 89.04% 71.33% 71.06%
with one NVIDIA K40 GPU. Redmon et al. designed YOLO
SDP+RPN [48] 88.85% 70.16% 73.74%
[43] which directly took the whole input images to train the
3DOP [49] 88.64% 67.47% 68.94%
model, and classiﬁes each pixel in the output feature maps.
This equals to dividing the input image into several cells and Mono3D [50] 88.66% 66.68% 66.36%
doing the classiﬁcation inside each cell, which avoids the ex- SDP+CRC [48] 83.53% 64.19% 61.31%
pensive process for proposals and can be around seven times Faster R-CNN [42] 81.84% 65.90% 63.35%
faster than Faster R-CNN to realize a more feasible real-time
detection with acceptable accuracy drop.
Table 2. Comparison of CONV layers in classic CNN mod-
These detection algorithms have shown outstanding perfor- els.
mance on a PASCAL VOC dataset [44]. However, for the AlexNet VGG-16 Inception v1 ResNet-50
autonomous vision scene, the detection mission would be Model [29] [35] [31] [36]
much tougher since the objects will be presented in much Top-5 Error 19.8% 8.8% 10.7% 7.0%
worse quality for the big variance of object scale and the in- # of Weights 2.3M 14.7M 6.0M 23.5M
complete captured object shape. Therefore, we need to opti- #ofMACs 666M 15.3G 1.43G 3.86G
mize the way we obtain proposals during our detection algo-
rithms. The corresponding representative benchmark for au-
tonomous vision is KITTI [45], and various algorithms have computing speed could reach 10-100 TOPS. To build such a
been proposed for the dataset. We have selected some top powerful processor with programmability and a power con-
ranked detection algorithms and have listed them in Table. 1. sumption of less than 30 W is a challenging task, and we will
Actually, most of these algorithms have taken CONV layers discuss the contenders in the next section.
to extract the features based on the classic CNN models with
small revisions followed by application dependent FC lay-
ers. We compare the CONV layers of classic CNN models 3. PROCESSORS FOR REAL-TIME AUTONOMOUS
in Table. 2. As we can see, giga MACs need to be solved VISION
for each input frame. Together with FC layers and consider-
ing the number of region proposals, in order to realize real- 3.1. Heterogeneous platforms for CNN acceleration
time processing, the hardware needs to provide a throughput
speed of over 100-1000 GOPS. With the growing number As the CNN algorithm rapidly develops, so have the related
of image data collected from cameras, future requirement of hardware accelerator designs, in recent years. The work of

,QWHUQDWLRQDO 7HOHFRPPXQLFDWLRQ 8QLRQ

29 30 31 32 33 34 35 36 37 38 39