Knockoff Feature Selection Techniques for Robust Inference in Supervised and Unsupervised Learning
Joan And Sanford I. Weill Medical College Of Cornell University, New York NY
Investigators
Abstract
This project aims to develop a new methodology for selecting key features among a large pool of potential variables that are predictive of the final outcomes. When applied to the biomedical field, these methods will enable the discovery of determinants of patient health, thus improving the prevention, treatment, and management of diseases. When used in fields such as engineering, psychology, sociology, economics, and environmental sciences, these methods can improve manufacturing processes, social programs that focus on diversity and equity, the care and management of mental health, and the preservation of the environment and natural resources. Additionally, the new methods will also help to generate high-quality synthetic data while maintaining the confidentiality of the original information, thereby spurring new scientific discoveries and providing a valuable educational tool. The project will offer a number of unique interdisciplinary training initiatives for the future cohorts of data scientists at the interface of statistics, machine learning, and biomedical sciences. The research agenda is based on the 'knockoff method' for identifying key features predictive of the outcomes while maintaining false discovery control. The methods incorporate the microbiome phylogenetic structure in feature selection, accommodate missing values, incorporate multiple knockoffs to increase robustness, employ nonparametric Bayesian models for complex data structures, and introduce a new knockoff statistic based on conditional prediction function. The proposed statistics can be paired with state-of-the-art machine learning models to detect nonlinear relationships while accounting for feature correlation. Furthermore, by applying knockoff filtering with unsupervised learning models, this research can identify determinants of the feature space and provide insights into unsupervised clustering and learning. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →