Page 39 - First special issue on The impact of Artificial Intelligence on communication networks and services
P. 39
,78 -2851$/ ,&7 'LVFRYHULHV 9RO 0DUFK
which ensure an efficient execution with programmability.
Table 5. Evaluation results of Densebox on GPU and FPGA
Evaluation shows that our system can get the best efficiency
platforms.
among peer processors with a satisfying real-time processing
NVIDIA GTX Xilinx ZU9
Platform performance. An ASIC-based solution can further exploit
1080TI GPU FPGA
the efficiency, which means a similar throughput speed with
Input Size 640x360
the FPGA-based Aristotle system and an energy cost of one
Task Densebox Densebox Pruned
order of magnitude less.
Operations (GOPs) 28 1.2
There are some other deep learning models utilized in RTAV
fps 150 330 300
applications. Recurrent neural network (RNN) is one of
Power (W) 250 14
them, and the long-short term memory (LSTM) model [77]
Efficiency (fps/W) 0.60 1.32 21.43
shows excellent performance in classifying, processing and
Recall 0.875
predicting time series. This feature can be helpful for object
tracking and action predicting functions in ADAS systems.
We have not expanded on this topic in this article, but we
platform and a peer GPU. Densebox is an end-to-end fully
have already released a similar design based on our Aristotle
convolutional network (FCN) which has been widely used in
system framework [78], which has proved the capability of
face detection applications, and face detection is an essen-
processing various deep learning models.
tial part of the in-vehicle driver status recognition, such as
drowsiness detection. We have pruned the model with the Future RTAV processors need to offer a 10-100 TOPS
method mentioned in clause 4.2 from 28 GOPs to 1.2 GOPs, throughput speed with less than 30 W power, and to re-
with the recall rate staying the same. Table. 5 shows that with alize this we could count on the rapid development of
the help of pruning, our ZU9-based platform can reach twice workload compression such as extreme low-bitwidth CNNs
the speed of the 1080TI GPU. The GPU can also get a 330 fps [79][80][81][82] and novel pruning ideas [83][84], hardware
with the pruned model, but the utilization rate of model spar- design such as dataflow optimization [85][86] and sparsity
supported architecture [87][88], and emerging memory tech-
sity is quite low considering the peak performance of 1080TI
nology implementation [60][89]. We are confident that with
is almost 10.6 TOPS, which results in an efficiency which
all those mentioned above, the reconfigurable products will
is 16 times worse than our ZU9 FPGA, reflecting the fit be-
tween our compression methods and our hardware system. thrive in the ADAS market.
4.5. Tingtao: an ASIC-based reconfigurable accelerator REFERENCES
Our ASIC-based reconfigurable accelerator Tingtao is al- [1] E. D. Dickmanns and V. Graefe, “Dynamic monocu-
ready on schedule. The PS of Tingtao is an ARM Cortex-A5 lar machine vision,” Machine vision and applications,
processor, and the PL includes two deep-learning processing vol. 1, no. 4, pp. 223–240, 1988.
units (DPUs), each containing 2048 MAC PEs and works
at 500MHz. Some necessary interfaces for RTAV applica- [2] C. Thorpe, M. H. Hebert, T. Kanade, and S. A. Shafer,
tion are also integrated. Tingtao has taken a 28nm CMOS “Vision and navigation for the carnegie-mellon navlab,”
technology and is projected to provide a peak performance IEEE Transactions on Pattern Analysis and Machine
of 4 TOPS at a power of 3 W, which is slightly better than Intelligence, vol. 10, no. 3, pp. 362–373, 1988.
the EyeQ4 product. With the compression method and com-
piling optimization introduced, the performance could get [3] E. D. Dickmanns and B. D. Mysliwetz, “Recursive 3-d
even better. As shown in Fig. 7, Tingtao has filled the sparse road and relative ego-state recognition,” IEEE Trans-
area of 1 to 10 W of power and TOPS level throughput. We actions on pattern analysis and machine intelligence,
are also planning to try a larger design for our next version, vol. 14, no. 2, pp. 199–213, 1992.
and we will pay efforts in the ongoing research of the im-
plementation of emerging memory technology based on our [4] J. Manyika, M. Chui, J. Bughin, R. Dobbs, P. Bisson,
precedent work [64] for the target of our development route. and A. Marrs, Disruptive technologies: Advances that
will transform life, business, and the global economy.
McKinsey Global Institute San Francisco, CA, 2013,
5. CONCLUSION vol. 180.
This article has reviewed the algorithms for RTAV applica- [5] J. Barbaresso, G. Cordahi, D. Garcia, C. Hill,
tions of ADAS, a comparative analysis has been done over A. Jendzejec, and K. Wright, “USDOT’s intelligent
different types of platforms, and an enumeration of chances transportation systems (ITS) ITS strategic plan 2015-
and challenges for reconfigurable RTAV platforms. We have 2019,” Tech. Rep., 2014.
introduced the software-hardware co-design workflow for
our reconfigurable RTAV system, with detailed hardware [6] Cabinet of Japan, “Statement on ”forging the world-
architecture design and implemented compression methods, leading it nation”,” Tech. Rep., 2013.
,QWHUQDWLRQDO 7HOHFRPPXQLFDWLRQ 8QLRQ