CRII: RI: A Study of Rank-based Decomposable Losses for Machine Learning

$175,000FY2024CSENSF

Purdue University, West Lafayette IN

Investigators

Abstract

Machine learning is crucial for advancing artificial intelligence (AI) and big data analysis, focusing on training models by minimizing objectives with a key element, loss. Many losses are rank-based, playing a critical role in applications like information retrieval, search engines, and recommender systems. They are also valued for training robust models capable of dealing with label noise, outliers, and imbalanced data. Decomposability is a key feature of these losses, allowing them to be broken down into individual components for efficiency and distributed computing. As a bridge between training data and the model, rank-based decomposable loss plays a vital role in machine learning, highlighting the need for thorough study. The project will explore rank-based decomposable loss from two perspectives: aggregate and individual losses. The aggregate loss is the loss over all training data and is constructed from the individual loss of the model for each data sample. The project will concentrate on several key inquiries: What are the general abstract formulations of the rank-based decomposable aggregate and individual losses for machine learning? How to efficiently optimize learning objectives formed based upon them while ensuring convergence? How can these losses be customized or modified to suit different machine learning problems? And what are the statistical behaviors of machine learning algorithms using these losses? The outcomes of this project will unearth new insights into established robust machine learning techniques and give the loss a new twist to consider ranks and decomposability. This project will be conducted in two interrelated thrusts. The first thrust explores a novel and general rank-based aggregate loss for supervised learning. The focus will encompass efficient algorithms that can optimize this loss with guaranteed convergence, along with streamlined techniques to determine relevant hyperparameters. The developed loss will be connected with distributionally robust optimization to gain insights into its sample-level robustness and the development of new types of rank-based aggregate losses. Additionally, theoretical guarantees will be established for rank-based aggregate losses, including classification calibration, classification consistency, and generalization properties. The second thrust aims to study a general formulation of rank-based individual loss with theoretical analysis, bolstering label-level robustness in multi-class and multi-label learning scenarios. Furthermore, the use of rank-based individual loss will be expanded to tackle fairness learning challenges and investigate the resilience of models trained with this loss against adversarial threats, including verification and defense mechanisms. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →