Page 36 - First special issue on The impact of Artificial Intelligence on communication networks and services
P. 36

,78 -2851$/  ,&7 'LVFRYHULHV  9RO        0DUFK





         [62]. It can realize a MAC operation through the summation  specialized controllers could be introduced to deal with ir-
         of currents from different memristor branches. This avoids  regular sparsity inside the models.
         the data movement and can save energy. Recent simulation  7) Multi-thread processing. For ADAS systems, it would
         works such as ISAAC [63] and PRIME [64] have evaluated  be best for different algorithms to be processed simultane-
         the efficiency of memristors in CNN acceleration.   ously, such as LDW would work on grayscale images while
         An ideal ADAS system should be able to offer a comput-  PD would process RGB images. Reconfigurable processors
         ing speed of over 200 GOPS with no more than 40 W, and  can provide vast spatial parallelism for algorithms to work in
         hence we can mark the sweet zone for ADAS systems as the  individual channels.
         red painted area in Fig. 7. Inside this sweet zone, we can  However, challenges remain for the wide use of reconfig-
         sketch a development route for the reconfigurable processors  urable processors such as:
         for RTAV acceleration, shown as the dark red curve. Starting
                                                            1) Programming language gap: Most developers use high-
         from the FPGA design, we can climb up through logic hard-
                                                            level programming languages to build their project, while for
         ening for an efficiency of above 1 TOPS/W, and with the help
                                                            reconfigurable processors they need to start from the bottom-
         of the implementation of next generation memory technol-
                                                            level hardware and describe the logic with register-transfer
         ogy, the bandwidth can be broaden and the memory access
                                                            level (RTL) hardware description language (HDL) such as
         cost could be reduced, which can lead to an even higher effi-
                                                            Verilog and VHDL.
         ciency, to more than 10 TOPS/W. We use the yellow star to
                                                            2) Limited on-chip resource: There is limited area for on-
         indicate our target in Fig. 7. With a larger die size, a through-
                                                            chip arithmetic and memory resource to map the tiled algo-
         put speed of over 100 TOPS could be expected, which can be
                                                            rithm on spatially. This might form a bottleneck for some
         a suitable choice for an ideal RTAV system.
                                                            large-scale algorithms.
                                                            3) Limited off-chip bandwidth: To communicate recon-
         3.2. Chances and challenges for reconfigurable proces-  figurable processors with off-chip memories like DDR, the
         sors                                               bandwidth is often limited by the clock frequency of the con-
                                                            troller and the width of data wires.
         In the area of RTAV, chances and challenges coexist for a
         wide application of reconfigurable processors. The follow-
         ing features of reconfigurable processors will bring them op-  3.3. Related reconfigurable processors
         portunities:
                                                            There have been many excellent reconfigurable processor de-
         1) Programmability. Reconfigurable processors can offer a
                                                            signs for deep learning models. Initial designs are mostly
         pool of logic and memory resources on-chip. Considering
                                                            based on FPGAs. Chakaradhar et al. [65] proposed a run-
         the fast evolving RTAV algorithms, it is not hard for users
                                                            time reconfigurable architecture for CNN on FPGA with ded-
         to update the on-chip functions after they bought it from the
                                                            icated switches to deal with different CNN layers. Zhang
         supplier.
                                                            et al. [66] used a nested loop model to describe CNN and
         2) Reliability. For example, the industrial grade FPGAs
                                                            designed the on-chip architecture based on high-level syn-
                                                    ◦
         can stably work in a temperature range between −40 C ∼
                                                            thesis optimizations. Suda et al. [67] presented an OpenCL-
            ◦
         100 C. This makes them able to satisfy the requirement of
                                                            based FPGA accelerator with fully-connected layers also im-
         standards AEC-Q100 and ISO 26262.
                                                            plemented on-chip.
         3) Low-power. The power consumption for reconfigurable
                                                            ASIC-based reconfigurable processors have appeared in re-
         processors is no more than 30 W. Low-power consumption is
                                                            cent years. The representative work is Diannao [55] and its
         suitable for the in-car environment.
                                                            subsequent series [68][69][70], which focused great efforts
         4) Low-latency. Since algorithms mapped onto reconfig-  on memory system optimization. Eyeriss [56] focused on
         urable processors provide deterministic timing, they can of-  the dataflow optimization and used smaller PEs to form a
         fer a latency of several nanoseconds, which is one order of  coarse-grained computing array. ENVISION [57] utilized
         magnitude faster than GPUs. A quick reaction of ADAS sys-  a dynamic-voltage-accuracy-frequency-scaling (DVAFS)
         tems is essential to dealing with sudden changes on the road.  method to enhance its efficiency and reached 10 TOPS/W
         5) Interfaces. Unlike GPU which can only make commu-  with low voltage supply. Googles TPU [58] has been the
         nication through the PCI Express protocol, both ASIC and  recent star with large on-chip memories and has reached a
         FPGA designs can provide huge interface flexibility, which  similar throughput speed to peer GPUs withdrawing much
         can be very helpful for ADAS system integration.   less energy.
         6) Customizable logic.  Recently there has been great  Most of these precedent reconfigurable processors have their
         progress in the area of model compression, including data  own features with partial optimization of the entire flow, but
         quantization and sparsity exploration. For general purpose  few consider the entire flow of the neural network acceler-
         processors like CPU and GPU, only fixed data types could be  ator system. Therefore, the on-chip utilization rate of dif-
         supported and the memory access pattern would be regular.  ferent CNN layers will eventually fluctuate [58] which may
         Reconfigurable processors can offer fine-grained customiz-  drag down the overall efficiency of the system, and there has
         ability which can support data type as low as to 1 bit, and  been a large space left for improvement from the aspect of




                                           ‹ ,QWHUQDWLRQDO 7HOHFRPPXQLFDWLRQ 8QLRQ
   31   32   33   34   35   36   37   38   39   40   41