CAREER: Rethink and Redesign of Analytics Databases for Machine Learning Model Serving

$547,584FY2022CSENSF

Arizona State University, Scottsdale AZ

Investigators

Abstract

It is urgent to apply artificial intelligence to interactive applications, such as supply-chain prediction, credit card fraud detection, customer service chatbot, emergency response, and healthcare consulting. Databases manage a significant portion of data for these applications. However, due to the lack of support for deep neural network inference in existing databases, artificial intelligence is usually provided by a separate process for machine learning. As a result, data transfer between decoupled systems significantly increases the latency, making it challenging to meet the time constraints of interactive applications. Such decoupling also complicates application development and system management. This research will enable native deep neural network model inferences from databases to eliminate cross-system overheads. It will provide a unified representation to bridge the gap between data queries and deep neural network models. Ultimately, the project will deliver a novel database system to facilitate interactive intelligent applications. The research will support a Big Data Magic Week activity for K-12 underrepresented students and refugee youths in Arizona. It will be used as a platform to prepare selected undergraduate students in Arizona State University for international research competitions. It will also be integrated with a graduate-level course on data-intensive systems for machine learning at Arizona State University. The research objective is to rethink and redesign analytics databases to unify data queries and machine learning model inferences, particularly deep neural network model inferences. The investigator divides the aim into three synergistic research thrusts. First, the research will develop methods to bridge the machine learning inference and the relational algebra processing through a unified two-level intermediate representation. It will support the progressive lowering of all-scale machine learning models into relational algebra expressions and flexible yet analyzable functions. Moreover, it facilitates multi-objective co-optimization of data queries and model inferences for optimal latency, accuracy, and resource utilization trade-offs. Second, the project will further provide ahead-of-time code generation to reduce the latency of model inferences. The generated code will allow runtime physical optimizations such as materialization of intermediate data and batch size tuning, which are adaptive to dynamic query frequencies. Third, the research will provide accuracy-aware storage optimizations by indexing and clustering tensor blocks based on their magnitudes, similarity, and other model-specific properties. This research will establish new connections between query processing and model inferences. If successful, the project will dramatically reduce end-to-end latency for a broad class of time-critical data-intensive artificial intelligence applications. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →