CAREER: Scalable and Adaptable Sparsity-driven Methods for more Efficient AI Systems

$550,306FY2023CSENSF

San Jose State University Foundation, San Jose CA

Investigators

Abstract

Artificial Intelligence (AI) and, in particular, Deep Neural Networks (DNN) have achieved better than human accuracy on many cognitive tasks involving images, natural language processing, and protein structure, among others. Unfortunately, due to high data processing demands, AI systems are typically run on power-hungry specialized computing hardware. Quantization, or approximation to smaller numerical values, has been used to reduce computing requirements. However, the fixed low bit-width DNNs may suffer losses in accuracy due to quantization errors. Many existing software solutions for quantization are also fixed or limited in bit-width choices. To address this trade-off and leverage data sparsity, the research team will investigate state-of-the-art methods and develop novel data quantization, encoding, and compression algorithms to integrate with existing AI systems. The methods developed have the potential to not only improve performance but also to reduce power requirements and boost the energy efficiency of AI systems. They will enable AI applications such as DNN inference on small devices, thus reducing the load on cloud infrastructure, improving user experience, providing data privacy, and avoiding security risks. The work proposed in this project has the potential to push the boundaries in many AI applications that run on energy storage-constrained devices, such as smart sensing, wearable devices, and autonomous driving. The research and educational tools will facilitate and increase student and research community participation in advancing AI research. The research goal of this project is to investigate quantization and compression methods that can leverage sparsity and improve efficiency in AI systems. The principal investigator (PI) plans to study adaptable quantization and compression methods to leverage sparsity in AI systems while minimizing the overhead in non-sparse situations and minimizing accuracy loss. The trade-off between accuracy and performance with the proposed methods will be studied and defined for automated tunable prioritization of either accuracy, performance, or energy efficiency. The PI plans to develop a prototype with parallel execution of the proposed methods to make the proposed methods truly effective for data centers and advanced hardware architectures. The proposed methods will be packaged into an AI vector primitives library that will be integrated with several popular Deep Learning frameworks as proof of concept, primarily targeting GPU and CPU systems. An integration API will be developed for frameworks like Pytorch or TensorFlow to allow easy integration with other vector primitives. Software libraries will be integrated with a web-based learning platform with automated feedback and a motivating environment to encourage student participation in solving AI challenges. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →