An Energy-Efficient, CMOS-based, and Scalable Mixed-Signal DNN System with Reconfigurable Crossbars
Wayne State University, Detroit MI
Investigators
Abstract
Deep neural networks have demonstrated high accuracy in a broad spectrum of applications, which are typically implemented on the cloud. However, it has become necessary to bring computation close to the data sources to address the data privacy, system latency, and energy consumption concerns, which are critical in applications such as healthcare, autonomous vehicles, and Internet-of-Things. Another arising issue relates to the typical use of digital systems to perform the necessary computations, but digital systems are fundamentally limited in handling big data efficiently. Although analog computation emerged as a promising alternative, which can outperform its digital counterpart by several orders of magnitudes in energy efficiency and computational speed, this alternative faces many challenges. First, emerging analog memories are unreliable, immature, and incompatible with standard CMOS technologies. Second, designing customizable and scalable analog deep neural networks that can support a broad range of deep neural networks is harder than the digital counterpart, leading to longer design cycles and higher costs. The broad goal of this project is to address these challenges by using innovative research directions in both analog and digital designs at the memory technology, circuit, architecture, and system levels, thereby enabling high scalability and great flexibility for various applications. This project will first develop a specialized electronic chip with a novel reconfigurable architecture and a new memory technology, utilizing a software-hardware co-design approach to achieve the necessary optimizations. Subsequently, this project will build a scalable system utilizing multiple such chips to adapt to the needs of even larger neural networks. This project will likely impact the design of future hardware accelerators because it improves energy efficiency, enhances computational speed, and enriches the functionality of deep neural networks while unlocking new capabilities of analog computations and allowing a massive deployment of edge devices. The educational aspects include filling the gap in the electrical and computer engineering curriculum related to deep neural network hardware implementation and providing various skills to participating students. The extensive outreach activities include developing and teaching two summer project-oriented courses in machine learning and circuit design for K-12 students in metro Detroit. This project proposes multiple innovations at the technology, circuit, architecture, and system levels. At the memory technology level, we propose a new, CMOS-based, analog multi-stable memory circuit that offers compatibility with crossbar systems, provides an analog solution to store the weights, and performs local computations in a highly parallel and energy-efficient manner. At the circuit level, we propose a local storage and processing unit, which employs the proposed analog memory to perform multiply-and-accumulate operations, utilizing only one memory unit to support both positive and negative weights. At the architecture level, we propose a novel reconfigurable architecture to construct a scalable and highly customizable inference engine. In this architecture, crossbar cells can be expanded horizontally and/or vertically using switch matrix circuits, thereby improving the utilization and reducing the power consumption compared to the fixed-size and separate crossbar arrays. Moreover, we propose the usage of a pool of reconfigurable digital-to-analog converters and analog-to-digital converters, which are well suited for the reconfigurable architecture, and will optimize their bit resolutions for each deep neural network layer. Furthermore, the proposed inference engine employs a low-power open-source processor (specifically RISC-V) to enable high flexibility and provide various functions, including the management of the data flow between layers, configuration, and calibration. To minimize the fetch/decode energy consumption, custom digital circuits will be developed to support non-linear and pooling functions. To account for process variations, we will exploit built-in-self-test and built-in-self-calibration techniques to test and calibrate all the analog circuits, utilizing on-chip circuits and configuration registers. Finally, at the system level, multiple chips will be interconnected through PCI Express to support larger models. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →