Page 33 - First special issue on The impact of Artificial Intelligence on communication networks and services
P. 33

,78 -2851$/  ,&7 'LVFRYHULHV  9RO        0DUFK













                                                                     Figure 5. A typical CNN architecture.



                                                            number of layers that run in sequence as shown in Figure 5.
                                                            Convolutional layer (CONV layer) and fully-connected layer
                                                            (FC layer) are two essential types of layer in CNN, followed
                                                            by optional layers such as pooling layers for down-sampling
              Figure 3. Common functions in ADAS system.    and normalization layers. The first CONV layer takes an
                                                            input image and outputs a series of feature maps, and the
                                                            following CONV layers will extract features to higher lev-
                                                            els layer by layer through convolving the input feature maps
                                                            with filters. After CONV layers, FC layers will classify the
                                                            extracted features and output the probability of each category
                                                            that the input image might belong to.
                                                            State-of-the-art CNN models have achieved outstanding per-
                                                            formance in computer vision areas. Take image classification
                                                            as example, in 2012 Krizhevsky et al. announced an 8-layer
                                                            CNN model AlexNet [29] which achieved 84.7% top-5 ac-
                                                            curacy on ImageNet [30], which was far beyond the perfor-
                                                            mance of conventional algorithms. Five years have passed,
                                                            many organizations such as Google [31][32][33][34], Ox-
                                                            ford [35], Microsoft [36] have been focusing on novel CNN
                                                            model designs with more complex computing patterns, and
           Figure 4. Workflow of traditional detection algorithms.  the accuracies of the top models have already surpassed the
                                                            human vision level [37].
         sure adjustment and image rectification would be performed  The excellent performance of CNN is because the generic
         to preprocess the collected images. ROI selection methods  descriptor extracted from CNN that trained on large scale
         depend on the type of task, such as vanishing point detection  datasets is much richer than the traditional manually de-
         (VPD) [19] and piecewise linear stretching function (PLSF)  signed features, and can be used for various tasks with some
         [20] are used in LDW, and sliding window methods are taken  fine tuning [38]. Hence for object detection problems, CNN-
         in PED, VD and TSR. It would be time consuming to execute  based algorithms can get a much better performance than the
         an exhaustive ROI search, so various optimizations are also  traditional ones.
         taken for ROI selection. Broggi et al. [21] use morphological  The workflows of different detection algorithms are shown in
         characteristics of objects and distance information. Uijlings  Fig. 6. R-CNN was first proposed [39]. It generates a set of
         et al. [22] propose a selective search approach to efficiently  region proposals with selective search, warp/crop each region
         generate ROIs. For feature extraction, various manually de-  into a static size, then extracts the feature maps with CONV
         signed features such as Scale-Invariant-Feature-Transform  layers, and finally completes the classification with FC and
         (SIFT) [23], Histogram-of-Oriented-Gradients (HOG) [24],  SVM layers. Since R-CNN needs to run CONV layers for
         Haar [25], etc. have been widely used in detection tasks. For  every region proposal which is very expensive in computa-
         classification, combined simple classifiers like AdaBoost  tions, SPP-net has appeared [40]. It merely needs to com-
         [26] and support vector machines (SVMs) [27] are popular  pute CONV layers only once with spatial pyramid pooling to
         to work with traditional features. Some part based method-  transfer feature maps into fixed length vectors for FC layers.
         ologies also appear to reduce the complexity of the overall  Based on SPP-net, Fast R-CNN was designed by Girshick et
         task, such as Felzenszwalb et al. [28] proposes a deformable  al. [41] which used multi-task loss to train the classifier and
         part model (DPM) to break down the objects into simple  bounding-box (BB) localizers jointly, with single-sized ROI
         parts.                                             pooling to the feature maps of the last CONV layer which
                                                            are projected with region proposals. Then Ren et al. [42]
         2.3. The rise of convolutional neural network (CNN)  proposed Faster R-CNN, using the region proposal network
                                                            (RPN), which was actually a Fast R-CNN network, to gener-
         In recent years, the rise of CNN has set off a revolution in  ate region proposals and to get rid of the large computations
         the area of object detection. A typical CNN consists of a  of traditional region proposal methods, and reused the Fast




                                             ‹ ,QWHUQDWLRQDO 7HOHFRPPXQLFDWLRQ 8QLRQ
   28   29   30   31   32   33   34   35   36   37   38