Page 36 - First special issue on The impact of Artificial Intelligence on communication networks and services
P. 36
,78 -2851$/ ,&7 'LVFRYHULHV 9RO 0DUFK
[62]. It can realize a MAC operation through the summation specialized controllers could be introduced to deal with ir-
of currents from different memristor branches. This avoids regular sparsity inside the models.
the data movement and can save energy. Recent simulation 7) Multi-thread processing. For ADAS systems, it would
works such as ISAAC [63] and PRIME [64] have evaluated be best for different algorithms to be processed simultane-
the efficiency of memristors in CNN acceleration. ously, such as LDW would work on grayscale images while
An ideal ADAS system should be able to offer a comput- PD would process RGB images. Reconfigurable processors
ing speed of over 200 GOPS with no more than 40 W, and can provide vast spatial parallelism for algorithms to work in
hence we can mark the sweet zone for ADAS systems as the individual channels.
red painted area in Fig. 7. Inside this sweet zone, we can However, challenges remain for the wide use of reconfig-
sketch a development route for the reconfigurable processors urable processors such as:
for RTAV acceleration, shown as the dark red curve. Starting
1) Programming language gap: Most developers use high-
from the FPGA design, we can climb up through logic hard-
level programming languages to build their project, while for
ening for an efficiency of above 1 TOPS/W, and with the help
reconfigurable processors they need to start from the bottom-
of the implementation of next generation memory technol-
level hardware and describe the logic with register-transfer
ogy, the bandwidth can be broaden and the memory access
level (RTL) hardware description language (HDL) such as
cost could be reduced, which can lead to an even higher effi-
Verilog and VHDL.
ciency, to more than 10 TOPS/W. We use the yellow star to
2) Limited on-chip resource: There is limited area for on-
indicate our target in Fig. 7. With a larger die size, a through-
chip arithmetic and memory resource to map the tiled algo-
put speed of over 100 TOPS could be expected, which can be
rithm on spatially. This might form a bottleneck for some
a suitable choice for an ideal RTAV system.
large-scale algorithms.
3) Limited off-chip bandwidth: To communicate recon-
3.2. Chances and challenges for reconfigurable proces- figurable processors with off-chip memories like DDR, the
sors bandwidth is often limited by the clock frequency of the con-
troller and the width of data wires.
In the area of RTAV, chances and challenges coexist for a
wide application of reconfigurable processors. The follow-
ing features of reconfigurable processors will bring them op- 3.3. Related reconfigurable processors
portunities:
There have been many excellent reconfigurable processor de-
1) Programmability. Reconfigurable processors can offer a
signs for deep learning models. Initial designs are mostly
pool of logic and memory resources on-chip. Considering
based on FPGAs. Chakaradhar et al. [65] proposed a run-
the fast evolving RTAV algorithms, it is not hard for users
time reconfigurable architecture for CNN on FPGA with ded-
to update the on-chip functions after they bought it from the
icated switches to deal with different CNN layers. Zhang
supplier.
et al. [66] used a nested loop model to describe CNN and
2) Reliability. For example, the industrial grade FPGAs
designed the on-chip architecture based on high-level syn-
◦
can stably work in a temperature range between −40 C ∼
thesis optimizations. Suda et al. [67] presented an OpenCL-
◦
100 C. This makes them able to satisfy the requirement of
based FPGA accelerator with fully-connected layers also im-
standards AEC-Q100 and ISO 26262.
plemented on-chip.
3) Low-power. The power consumption for reconfigurable
ASIC-based reconfigurable processors have appeared in re-
processors is no more than 30 W. Low-power consumption is
cent years. The representative work is Diannao [55] and its
suitable for the in-car environment.
subsequent series [68][69][70], which focused great efforts
4) Low-latency. Since algorithms mapped onto reconfig- on memory system optimization. Eyeriss [56] focused on
urable processors provide deterministic timing, they can of- the dataflow optimization and used smaller PEs to form a
fer a latency of several nanoseconds, which is one order of coarse-grained computing array. ENVISION [57] utilized
magnitude faster than GPUs. A quick reaction of ADAS sys- a dynamic-voltage-accuracy-frequency-scaling (DVAFS)
tems is essential to dealing with sudden changes on the road. method to enhance its efficiency and reached 10 TOPS/W
5) Interfaces. Unlike GPU which can only make commu- with low voltage supply. Googles TPU [58] has been the
nication through the PCI Express protocol, both ASIC and recent star with large on-chip memories and has reached a
FPGA designs can provide huge interface flexibility, which similar throughput speed to peer GPUs withdrawing much
can be very helpful for ADAS system integration. less energy.
6) Customizable logic. Recently there has been great Most of these precedent reconfigurable processors have their
progress in the area of model compression, including data own features with partial optimization of the entire flow, but
quantization and sparsity exploration. For general purpose few consider the entire flow of the neural network acceler-
processors like CPU and GPU, only fixed data types could be ator system. Therefore, the on-chip utilization rate of dif-
supported and the memory access pattern would be regular. ferent CNN layers will eventually fluctuate [58] which may
Reconfigurable processors can offer fine-grained customiz- drag down the overall efficiency of the system, and there has
ability which can support data type as low as to 1 bit, and been a large space left for improvement from the aspect of
,QWHUQDWLRQDO 7HOHFRPPXQLFDWLRQ 8QLRQ