CAREER: Supervised Learning for Incomplete and Uncertain Data
University Of Missouri-Columbia, Columbia MO
Investigators
Abstract
This CAREER project will advance the state of the art in supervised machine learning to allow for incomplete, uncertain and unspecific label information. Supervised machine learning algorithms produce desired outputs for given input data by learning from example training data. The methods generally rely on completely and accurately labeled training data to drive the learning algorithm. However, many applications are plagued with labels that are incomplete, uncertain, and unspecific (lack precision). Current techniques do not adequately handle such data. For example, analysis of satellite imagery to identify the content of each pixel is often conducted by coupling unsupervised learning methods (that do not rely on labeled training data) with manual exploration. This is time-consuming, error-prone, and expensive. Imagine, instead, easy-to-use tools that could understand the content of each pixel in satellite imagery. Extremely large amounts of road map data (for example from Google Maps or OpenStreetMap) and social media information (for example geo-tagged photographs, video clips, and social networking posts) are continually collected and stored. These data could be used as sparsely-labeled training data (with varying degrees of specificity and uncertainty) to guide understanding of satellite imagery. Although the data is available, algorithms have yet to be developed to combine these data sources and identify the content of pixels in satellite images. This work will advance this and other potential applications of machine learning where incomplete, uncertain and unspecific labels in training data challenge the development of effective machine learning algorithms. This CAREER project will achieve these advances through the following research objectives: (1) Investigate and develop a mathematical framework and associated algorithms for Multiple Instance Function Learning that addresses linear and non-linear classification and regression problems with varying levels and types of sparsity, uncertainty, and specificity in training labels. (2) Study and apply the proposed framework and algorithms towards the fusion of satellite imagery, road map data and social media for global scene understanding. This research will be conducted in conjunction with integrated education and outreach activities. In particular, an interactive web application will be developed to provide an avenue for introducing concepts from machine learning and remote sensing to the public for dissemination and outreach. This interactive web application will also be used, along with additional hands-on activities, to introduce high school students to machine learning and remote sensing concepts during an annual summer engineering camp held at the University of Missouri in Columbia, MO. Paired with the web application will be a research website in which data, code, publications and presentations will be shared with the research community. Furthermore, undergraduate and graduate research assistants will be trained in the areas of machine learning and remote sensing. Finally, relevant research topics will be introduced in the PI's undergraduate and graduate courses.
View original record on NSF Award Search →