High-Dimensional Interaction Detection and Nonparametric Inference
University Of Southern California, Los Angeles CA
Investigators
Abstract
Understanding how variables interact with each other is fundamentally important in many scientific discoveries and contemporary applications, especially in areas such as social networks, marketing, medicine, genetics, and cancer studies. Identifying important interactions can also help improve model interpretability and prediction. Yet interaction detection with high-dimensional data poses great challenges since the number of pairwise interactions increases quadratically with the number of covariates and that of higher-order interactions grows even faster. Although there is a growing literature on interaction detection, there is a limited amount of work on the error rate control and inference aspects. Building robust statistical foundations of interaction detection and nonparametric inference, and offering reproducible and scalable algorithms for selecting important interactions can greatly facilitate the use of these much-needed tools in real applications. The common theme underlying this entire project is that of developing statistical methodologies and theories on high-dimensional interaction detection and nonparametric inference with statistical guarantees and improved reproducibility and interpretability. This project has three interrelated aims of timely theoretical and methodological studies on high-dimensional interaction detection and nonparametric inference. The first aim establishes the theoretical foundation of prediction and false sign rate control for interaction detection in ultra-high dimensional regression models. The second aim builds on the recent development of model-X knockoffs and proposes new methods for high-dimensional interaction detection with false discovery rate control and appealing power. The third aim further investigates the nonlinear interactions between a pair of high-dimensional random vectors and develops a new testing procedure for high-dimensional nonparametric inference through the lens of distance correlation. The systematic research program developed in three aims above will help build rigorous statistical foundations of theory and methodologies for high-dimensional data analysis that can guide practitioners and researchers. The investigators also plan to systematically develop tractable and efficient computation algorithms to implement the proposed methods through free software packages, like R and Python, and then make them readily available and publicize them in all relevant fields. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →