Page 39 - First special issue on The impact of Artificial Intelligence on communication networks and services
P. 39

,78 -2851$/  ,&7 'LVFRYHULHV  9RO        0DUFK





                                                            which ensure an efficient execution with programmability.
         Table 5. Evaluation results of Densebox on GPU and FPGA
                                                            Evaluation shows that our system can get the best efficiency
         platforms.
                                                            among peer processors with a satisfying real-time processing
                              NVIDIA GTX     Xilinx ZU9
               Platform                                     performance. An ASIC-based solution can further exploit
                               1080TI GPU      FPGA
                                                            the efficiency, which means a similar throughput speed with
               Input Size             640x360
                                                            the FPGA-based Aristotle system and an energy cost of one
                 Task        Densebox   Densebox Pruned
                                                            order of magnitude less.
           Operations (GOPs)    28            1.2
                                                            There are some other deep learning models utilized in RTAV
                  fps          150      330     300
                                                            applications. Recurrent neural network (RNN) is one of
              Power (W)           250            14
                                                            them, and the long-short term memory (LSTM) model [77]
           Efficiency (fps/W)   0.60    1.32     21.43
                                                            shows excellent performance in classifying, processing and
                Recall                  0.875
                                                            predicting time series. This feature can be helpful for object
                                                            tracking and action predicting functions in ADAS systems.
                                                            We have not expanded on this topic in this article, but we
         platform and a peer GPU. Densebox is an end-to-end fully
                                                            have already released a similar design based on our Aristotle
         convolutional network (FCN) which has been widely used in
                                                            system framework [78], which has proved the capability of
         face detection applications, and face detection is an essen-
                                                            processing various deep learning models.
         tial part of the in-vehicle driver status recognition, such as
         drowsiness detection. We have pruned the model with the  Future RTAV processors need to offer a 10-100 TOPS
         method mentioned in clause 4.2 from 28 GOPs to 1.2 GOPs,  throughput speed with less than 30 W power, and to re-
         with the recall rate staying the same. Table. 5 shows that with  alize this we could count on the rapid development of
         the help of pruning, our ZU9-based platform can reach twice  workload compression such as extreme low-bitwidth CNNs
         the speed of the 1080TI GPU. The GPU can also get a 330 fps  [79][80][81][82] and novel pruning ideas [83][84], hardware
         with the pruned model, but the utilization rate of model spar-  design such as dataflow optimization [85][86] and sparsity
                                                            supported architecture [87][88], and emerging memory tech-
         sity is quite low considering the peak performance of 1080TI
                                                            nology implementation [60][89]. We are confident that with
         is almost 10.6 TOPS, which results in an efficiency which
                                                            all those mentioned above, the reconfigurable products will
         is 16 times worse than our ZU9 FPGA, reflecting the fit be-
         tween our compression methods and our hardware system.  thrive in the ADAS market.
         4.5. Tingtao: an ASIC-based reconfigurable accelerator               REFERENCES
         Our ASIC-based reconfigurable accelerator Tingtao is al-  [1] E. D. Dickmanns and V. Graefe, “Dynamic monocu-
         ready on schedule. The PS of Tingtao is an ARM Cortex-A5  lar machine vision,” Machine vision and applications,
         processor, and the PL includes two deep-learning processing  vol. 1, no. 4, pp. 223–240, 1988.
         units (DPUs), each containing 2048 MAC PEs and works
         at 500MHz. Some necessary interfaces for RTAV applica-  [2] C. Thorpe, M. H. Hebert, T. Kanade, and S. A. Shafer,
         tion are also integrated. Tingtao has taken a 28nm CMOS  “Vision and navigation for the carnegie-mellon navlab,”
         technology and is projected to provide a peak performance  IEEE Transactions on Pattern Analysis and Machine
         of 4 TOPS at a power of 3 W, which is slightly better than  Intelligence, vol. 10, no. 3, pp. 362–373, 1988.
         the EyeQ4 product. With the compression method and com-
         piling optimization introduced, the performance could get  [3] E. D. Dickmanns and B. D. Mysliwetz, “Recursive 3-d
         even better. As shown in Fig. 7, Tingtao has filled the sparse  road and relative ego-state recognition,” IEEE Trans-
         area of 1 to 10 W of power and TOPS level throughput. We  actions on pattern analysis and machine intelligence,
         are also planning to try a larger design for our next version,  vol. 14, no. 2, pp. 199–213, 1992.
         and we will pay efforts in the ongoing research of the im-
         plementation of emerging memory technology based on our  [4] J. Manyika, M. Chui, J. Bughin, R. Dobbs, P. Bisson,
         precedent work [64] for the target of our development route.  and A. Marrs, Disruptive technologies: Advances that
                                                                will transform life, business, and the global economy.
                                                                McKinsey Global Institute San Francisco, CA, 2013,
         5. CONCLUSION                                          vol. 180.

         This article has reviewed the algorithms for RTAV applica-  [5] J. Barbaresso, G. Cordahi, D. Garcia, C. Hill,
         tions of ADAS, a comparative analysis has been done over  A. Jendzejec, and K. Wright, “USDOT’s intelligent
         different types of platforms, and an enumeration of chances  transportation systems (ITS) ITS strategic plan 2015-
         and challenges for reconfigurable RTAV platforms. We have  2019,” Tech. Rep., 2014.
         introduced the software-hardware co-design workflow for
         our reconfigurable RTAV system, with detailed hardware  [6] Cabinet of Japan, “Statement on ”forging the world-
         architecture design and implemented compression methods,  leading it nation”,” Tech. Rep., 2013.




                                             ‹ ,QWHUQDWLRQDO 7HOHFRPPXQLFDWLRQ 8QLRQ
   34   35   36   37   38   39   40   41   42   43   44