CAREER: Learning from Heterogeneous Populations in Small Data Regime with Applications to Preference and Metric Learning

$462,062FY2023CSENSF

University Of Wisconsin-Madison, Madison WI

Investigators

Abstract

Understanding how humans represent different concepts, perceive different options, and make judgements based on them plays a vital role in cognitive and behavioral sciences, consumer recommendation systems, individualized education, crowdsourced democracy and in quantifying survey data for social sciences and policy making. Preference and metric learning using judgment from humans have emerged as powerful tools to learn such representations. Most of these learning algorithms, however, are limited to studying models that are averaged over the population and do not capture the variations among the diverse set of people comprising the population. This project aims to close this gap by developing novel models, analyzing their fundamental limits, and designing algorithms with guarantees that can be learned at different scales of granularity. The results of this project have the potential to usher in a new paradigm in preference and metric learning. This project will also have significant educational and outreach impact through course modules for graduate and undergraduate students, research mentoring for undergraduate students, and public outreach programs. From biological sciences to social sciences, many scientific studies involve societal-scale datasets collected over heterogeneous populations, e.g., different ages, demographics, etc. Such datasets also usually have only a few observations per individual (small data regime). In general, off-the-shelf machine learning algorithms are not built with consideration to the statistical challenges arising from issues like heterogeneity and small data. This project addresses the challenges that arise when learning from heterogeneous populations in small data regimes in preference and metric learning by developing novel models, theoretical foundations in terms of fundamental limits, and practical algorithms with guarantees for learning from heterogeneous data at different levels of granularity. Specifically, the project aims to establish fundamental limits, develop models and algorithms with theoretical guarantees for (1) learning distribution of preferences over a population, (2) learning metrics at subgroup levels, and (3) learning individual variability by leveraging common structures and priors learned over the population. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →