III: Small: Deep Interactive Reinforcement Learning for Self-optimizing Feature Selection

$499,985FY2022CSENSF

The University Of Central Florida Board Of Trustees, Orlando FL

Investigators

Abstract

Feature selection is a classic, yet fundamental, machine learning task that aims to select a subset of relevant features (variables, predictors) to construct a predictive model. Feature selection has a wide range of applications, such as biomarker discovery, system monitoring, fault diagnosis, image recognition, text mining, and financial fraud detection. However, a longstanding criticism of the state of the practice in feature selection is that existing methods require empirical specifications of hyperparameters, lack the ability to search for the best feature subset, and ignore set-level feature-feature interaction. To fill the research gap, this project will develop a deep and interactive reinforced feature selection learning framework (RFSL). The project’s novelties are to propose a self-optimizing feature selection concept to achieve two goals: 1) self-optimization and 2) global optimality. The project's impacts are to improve the automation and optimality of feature selection, enrich the availability and applicability of predictive modeling that need feature selection, and advance representative biomarker discovery for biomedical applications. To achieve this goal, we will address three technical challenges. The first is the framework challenges: how can we develop a machine learning framework to automate the self-optimizing feature selection while providing an effectiveness guarantee? The second is interaction challenges: which interaction mechanisms can help agents to leverage external and prior knowledge to improve learning? The Third is feedback challenges: can downstream tasks feed their intermediate results back to improve feature selection? We will answer the questions by the following thrusts: (1) Learning Framework: a new learning framework (RFSL) will be developed to balance automation and effectiveness in self-optimizing feature selection. (2) Interactive Learning Mechanisms: an interactive perspective will be proposed to augment the external and prior knowledge learning ability of RFSL. Three novel mechanisms (i.e., action level, reward level, and environment level) will be developed to expand the interaction channels of RFSL. (3) Algorithm in The Loop Feedback: new approaches for algorithms in the loop will be developed to take advantage of feature tree structure feedback in a downstream predictive task to provide adaptive learning of feature selection policies to overcome distribution shifts and model drifts. (4) Embedding into Real Systems: the proposed framework will be integrated into the recently developed next-generation sequencing platform (e.g., mRNA-sequencing, whole-genome sequencing) with high-dimension low sample-size genomic data to identify robust molecular signatures to better understand the biological mechanisms behind different diseases. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →