SHF: Small: Domain-Specific FPGAs to Accelerate Unrolled DNNs with Fine-Grained Unstructured Sparsity and Mixed Precision
Cornell University, Ithaca NY
Investigators
Abstract
Artificial intelligence (AI) has become an essential part of our daily lives, revolutionizing various industries and transforming the way we interact with technology. One of the key factors behind AI's remarkable progress is the efficiency of deep neural networks (DNNs). These complex systems, akin to the human brain, excel at processing vast amounts of data, enabling them to learn and make informed decisions. Compared to traditional computer programs, DNNs have shown superior performance in tasks such as image recognition, natural language processing, and decision-making. Unfortunately, this improved performance requires substantially more energy and computational resources. This not only increases their running costs but also limits their deployment in resource-constrained environments like battery-powered devices, hindering the broader adoption of AI systems. Interestingly, DNN computations often involve many redundant operations, termed generally as "sparsity." This project aims to develop specialized computer chips and software programs that exploit abundant fine-grained sparsity to enhance AI performance while reducing energy consumption and computational costs. Outcomes of this research award will be integrated into educational curricula and research mentorship plans at the graduate and undergraduate level, to educate the next generation of computer engineers on the importance of hardware/software codesign for deep learning. In addition, an outreach activity is planned to increase the participation of women in the hardware development for AI. This project focuses on the hardware acceleration of DNNs with fine-grained unstructured sparsity and mixed precision, two forms of redundancy that have yet to be exploited efficiently by existing computer chips. The research team focuses on optimizing unrolled DNN circuits on programmable hardware, starting with the current general-purpose hardware fabric of field-programmable gate arrays (FPGAs) and progressing towards DNN-optimized fabrics. A systematic benchmark-driven approach is used to specialize FPGA components for the implementation of unrolled DNN circuits. Furthermore, the team investigates more significant changes to the FPGA fabric, such as time-multiplexing and in-memory computing, to increase logic capacity and enable the deployment of larger DNNs. To extract maximum efficiency, DNN sparsification algorithms are codesigned, including pruning, quantization, and parameter sharing. This award is expected to result in new bit-programmable hardware architectures, DNN sparsification algorithms, and a research framework to synergistically codesign sparse DNNs and hardware. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →