CRII: SHF: Assessing and Profiling Continuous Integration for Machine Learning Applications
Regents Of The University Of Michigan - Dearborn, Dearborn MI
Investigators
Abstract
This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2). Continuous integration (CI) is a widely adopted software development practice for faster code change integration and maintenance of software quality attributes. At the same time, machine learning (ML), including deep learning (DL), is quickly gaining popularity for solving complex problems. Like typical software, ML applications also require many iterations to improve software quality. However, iterative ML application-development processes face higher-level difficulties in adopting CI in three aspects. First, developers lack systematic understanding for managing ML data, models and code in the CI workflow. For ML-based systems, the process to define the CI workflow is currently much more experimental in nature. Second, existing CI systems are lacking in the handling of ML-centric challenges such as defining evaluation conditions of ML models, formulating complicated build steps, long build and integration time, etc. For ML applications, the build process is much more complicated due to the complex dependency of data, model, code, etc. Third, data scientists lack the technical support to adopt and maintain CI configurations due to complex interconnections among data, model, code, etc. and the changing nature of the ML applications. For data scientists with limited or no knowledge of CI, it becomes increasingly difficult to adopt CI for ML applications, and even if adopted it requires too much manual effort. The project will make progress in acquiring knowledge on the feasibility and effectiveness of the current adoption of CI for ML applications. This will assist in understanding the workflow of CI for ML applications and identifying improvement scopes. Moreover, the project will develop a novel CI profiling framework to generate a dependency graph among heterogeneous artifacts (e.g., data, model, code, etc.) of ML applications. The knowledge and framework will serve as the basis for future research on automatic generation and maintenance of ML CI configuration, mining software repositories, build optimization and monitoring systems for ML CI. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →