GGrantIndex
← Search

Collaborative Research: Mathematical Programming for Streaming Data

$150,473FY2011ENGNSF

University Of California-Berkeley, Berkeley CA

Investigators

Abstract

A large amount of data is now easily accessible in real-time in a streaming fashion: news, traffic, temperature or other physical measurements sent by sensors on cell phones. Applying statistical and machine learning methods to these streaming data sets represents tremendous opportunities for a better real-time understanding of complex physical, social or economic phenomena. These algorithms could be used, for example, to understand trends in how news media cover certain topics, and how these trends evolve over time, or to track incidents in transportation networks. Unfortunately, most algorithms for large-scale data analysis are not designed for streaming data; typically, adding data points (representing, say, today's batch of news articles from the Associated Press) requires re-solving the entire problem. In addition, many of these algorithms require the whole data set under consideration to be stored in one place. These constraints make classical methods impractical for modern, live data sets. This project's focus is on optimization algorithms designed to work in online mode, allowing for faster, possibly real-time, updating of solutions when new data or constraints are added to the problem. Efficient online algorithms are currently known for just a few special cases. Using homotopy methods and related ideas, this work will seek to allow online updating for a host of modern data analysis problems. A special emphasis will be put on problems involving sparsity or grouping constraints; such constraints are important for example to understand how a few key features in the data set that explain most of the changes in the data. These new online algorithms will be amenable to distributed implementations to allow for parts of the data to be stored on different servers. These methods will be applied to streaming news data coming from major US media, and also to the problem of online detection, which arises when tracking some important signal over, say, a communication network, in an online fashion.

View original record on NSF Award Search →