EAGER: A Python Program Analysis Infrastructure to Facilitate Better Data Processing
Purdue University, West Lafayette IN
Investigators
Abstract
Python is the third most popular programming language, after C and Java, and the most widely used language in Machine Learning and Data Science. Applications in Python are prone to human errors as much as those in other languages, or maybe more so due to the dynamic nature of Python. Therefore, tools to analyze, test, verify, and optimize Python applications are in a pressing need. Such tools are lagging or non-existent for Python. The root cause is the lack of infrastructure to support building practical and effective tools, which entails addressing the dynamic features of Python, such as dynamic typing, dynamic code loading/execution, and pervasive invocations to external library functions implemented in other languages. This project aims to explore the feasibility of building a Python program analysis infrastructure by developing two sample tools that rely upon a common set of infrastructural capabilities including the instrumentation, static analysis and symbolic analysis capabilities. The two sample tools are a data provenance tracking tool for machine learning applications and a bug finding tool to detect data format inconsistencies, which are the most dominant type of bugs in data processing. The provenance tool will demonstrate the importance of static analysis and program instrumentation, and the bug finding tool will demonstrate the importance of symbolic analysis. Both tools will illustrate the great benefits that can be brought to data scientists by advanced tools. In addition, they will illustrate that the aforementioned capabilities cannot be simply ported from existing infrastructures for other languages such as C and Java. The infrastructure will meet the pressing need of comprehensive tool building support for Python. A lot of cutting-edge synergistic research will be enabled across the CISE research community to serve data application programmers, data scientists and even end users. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →