SoD-HCER: Learning Based Programming
University Of Illinois At Urbana-Champaign, Urbana IL
Investigators
Abstract
A significant amount of the software written today interacts with naturally occurring (sensor) data such as text, speech, images and video, streams of financial data, and biological sequences, and needs to reason with respect to concepts that are complex and often difficult to define explicitly in terms of the raw data observed (e.g., determining the gender of a person in an image, determining the topic of an article, determining whether more than three people are currently meeting in someone's office, scheduling a computation in a grid in a way that adapts to a multitude of properties of the resources and links. Applications that require such abilities are expected to rapidly grow even more important in future years. While conventional programming languages rely on a programmer to explicitly define all the concepts and relations involved, programming with naturally occurring data that is highly variable and ambiguous at the measurement level necessitates a programming model in which some of the variables, concepts and relations may not be known at programming time, may be defined only in a data driven way, or may not be unambiguously defined without relying on other concepts acquired this way. It must be possible to reason with respect to variables that do not depend on tight assumptions on the environment in which the measurements are taken, and needs to center around a semantic level interaction model made possible via components that are data-dependent and support abstractions over real-world observations. Today's programming paradigms, and the corresponding programming languages, are not conducive to that goal. Consequently, despite two decades of progress in machine learning, and a clear need for systems with significant trainable (data dependent) components, few systems today incorporate significant machine learning components, and even fewer use more than a single classifier. In this project on Learning Based Programming (LBP), the PI will explore a novel software engineering paradigm that allows a programmer seamless incorporation of trainable variables into the program and, consequently, the ability to reason using high-level concepts without the need to explicitly define them in terms of all the variables they might depend on, or the functional dependencies among them; these may be determined in a data-driven way, via learning operators whose details are abstracted away from the programmer. In this work, the PI will flesh out the details of the LBP paradigm he envisages, and implement an LBP language and study it via the development of applications in two areas: ubiquitous computing and natural language processing. Broader Impacts: This project will lead to cross-fertilization and mutual reinvigoration of the software engineering and machine learning fields. Enabling the development of computer systems that interact and cope with the variability of naturally occurring (sensor) data will require fundamental advances in compilation and software engineering issues. Conversely, availability of the LBP vehicle will motivate researchers in machine learning to explore the process of making inferences that rely on a large number of mutually dependent learners as a means to providing programmers with better abstractions so that they can more effectively tackle a broad range of increasingly complex applications involving such data.
View original record on NSF Award Search →