Page 33 - First special issue on The impact of Artificial Intelligence on communication networks and services

P. 33

,78 -2851$/ ,&7 'LVFRYHULHV 9RO 0DUFK

Figure 5. A typical CNN architecture.

number of layers that run in sequence as shown in Figure 5.
Convolutional layer (CONV layer) and fully-connected layer
(FC layer) are two essential types of layer in CNN, followed
by optional layers such as pooling layers for down-sampling
Figure 3. Common functions in ADAS system. and normalization layers. The ﬁrst CONV layer takes an
input image and outputs a series of feature maps, and the
following CONV layers will extract features to higher lev-
els layer by layer through convolving the input feature maps
with ﬁlters. After CONV layers, FC layers will classify the
extracted features and output the probability of each category
that the input image might belong to.
State-of-the-art CNN models have achieved outstanding per-
formance in computer vision areas. Take image classiﬁcation
as example, in 2012 Krizhevsky et al. announced an 8-layer
CNN model AlexNet [29] which achieved 84.7% top-5 ac-
curacy on ImageNet [30], which was far beyond the perfor-
mance of conventional algorithms. Five years have passed,
many organizations such as Google [31][32][33][34], Ox-
ford [35], Microsoft [36] have been focusing on novel CNN
model designs with more complex computing patterns, and
Figure 4. Workﬂow of traditional detection algorithms. the accuracies of the top models have already surpassed the
human vision level [37].
sure adjustment and image rectiﬁcation would be performed The excellent performance of CNN is because the generic
to preprocess the collected images. ROI selection methods descriptor extracted from CNN that trained on large scale
depend on the type of task, such as vanishing point detection datasets is much richer than the traditional manually de-
(VPD) [19] and piecewise linear stretching function (PLSF) signed features, and can be used for various tasks with some
[20] are used in LDW, and sliding window methods are taken ﬁne tuning [38]. Hence for object detection problems, CNN-
in PED, VD and TSR. It would be time consuming to execute based algorithms can get a much better performance than the
an exhaustive ROI search, so various optimizations are also traditional ones.
taken for ROI selection. Broggi et al. [21] use morphological The workﬂows of different detection algorithms are shown in
characteristics of objects and distance information. Uijlings Fig. 6. R-CNN was ﬁrst proposed [39]. It generates a set of
et al. [22] propose a selective search approach to efﬁciently region proposals with selective search, warp/crop each region
generate ROIs. For feature extraction, various manually de- into a static size, then extracts the feature maps with CONV
signed features such as Scale-Invariant-Feature-Transform layers, and ﬁnally completes the classiﬁcation with FC and
(SIFT) [23], Histogram-of-Oriented-Gradients (HOG) [24], SVM layers. Since R-CNN needs to run CONV layers for
Haar [25], etc. have been widely used in detection tasks. For every region proposal which is very expensive in computa-
classiﬁcation, combined simple classiﬁers like AdaBoost tions, SPP-net has appeared [40]. It merely needs to com-
[26] and support vector machines (SVMs) [27] are popular pute CONV layers only once with spatial pyramid pooling to
to work with traditional features. Some part based method- transfer feature maps into ﬁxed length vectors for FC layers.
ologies also appear to reduce the complexity of the overall Based on SPP-net, Fast R-CNN was designed by Girshick et
task, such as Felzenszwalb et al. [28] proposes a deformable al. [41] which used multi-task loss to train the classiﬁer and
part model (DPM) to break down the objects into simple bounding-box (BB) localizers jointly, with single-sized ROI
parts. pooling to the feature maps of the last CONV layer which
are projected with region proposals. Then Ren et al. [42]
2.3. The rise of convolutional neural network (CNN) proposed Faster R-CNN, using the region proposal network
(RPN), which was actually a Fast R-CNN network, to gener-
In recent years, the rise of CNN has set off a revolution in ate region proposals and to get rid of the large computations
the area of object detection. A typical CNN consists of a of traditional region proposal methods, and reused the Fast

,QWHUQDWLRQDO 7HOHFRPPXQLFDWLRQ 8QLRQ

28 29 30 31 32 33 34 35 36 37 38