Large-Scale Structured Optimization: Convex Relaxations and Gradient Methods

$0FY2009MPSNSF

University Of Washington, Seattle WA

Investigators

Abstract

Optimization problems arise in signal/image denoising, compressed sensing, sensor network localization, multi-task learning, data mining/classification, are large-scale, nonlinear, but highly structured. An effective solution approach is to approximate the problems by convex relaxations that are computationally tractable and inherit the structures of the original problems. Specialized computer algorithms are then developed to exploit the structures and efficiently solve the convex relaxations in real time. The investigator and his colleagues/students study the accuracy of different convex relaxations (i.e., how well they approximate the original problems) when data may be noisy, and develop new computer algorithms that are more efficient in theory and in practice. This includes coordinatewise and incremental gradient algorithms, possibly accelerated through extrapolation. Complexity and convergence rate are studied, as well as computer implementation and testing. Thus the twin topics of approximation by convex relaxations and algorithms for convex relaxations are integrated in their development. Measured data, from satellite images to biological images to stock indices to internet queries/usage, are inherently noisy and large size. How to process such large noisy data and extract key features/patterns is a major challenge, with important applications to image restoration/denoising, economic forecasting, pattern recognition, data classification, etc. In the proposed research, parametrized mathematical models of the underlying data generating process are formulated and the parameters are tuned by specialized computer algorithms to optimize the model's predictive power. The latter is achieved by optimizing a combination of the model's goodness-of-fit to data and a measure of model's simplicity. This is motivated by Occam's Razor principle that the simplest is the best (in particular, for identifying underlying patterns and trends). For example, a 1 Mega-pixel noisy image might be modeled by a weighted sum of 1000 "elementary" images from a dictionary. The computer algorithm then finds weights that are sparse (i.e., few nonzeros) and fit the model closely to the noisy image (say, in a least squares sense of the pixel values). Predicting people's behavior based on past data is another example, such as the Netflix prize for predicting movie rating which has a training data set of over 100 million ratings from over 480 thousand people. Owing to the large data size (and, in some cases, a need to process the data fast and in real time) and possibly high-dimensional parameter space, specialized algorithms based on new techniques (e.g., work with small chunks of data or small number of parameters at each time) need to be developed. This is the aim of the proposed research.

View original record on NSF Award Search →