Collaborative Research: III: Small: Advancing Data-Centric AI through Generative Approaches for Feature Space Reconstruction
Virginia Polytechnic Institute And State University, Blacksburg VA
Investigators
Abstract
The current development cycle of Artificial Intelligence (AI) methods and tools typically involves collecting data, using the data to construct a representation (known as feature spaces), and applying machine learning models to categorize examples using this feature space representation. Building an optimal and appropriate feature space is essential as it can be used to reconstruct distance measures, reshape discriminative patterns, and enhance the AI readiness (structural, predictive, interaction, and expression levels) of the data. Appropriate features are extremely important for real-world deployments across both scientific and industrial applications. This project seeks to create a more automated and generic framework, along with effective tools, to distill fundamental knowledge of feature spaces and build an AI-ready feature space. Artificial intelligence has the potential to deliver far better features than human engineers can. This project aims to transform the traditional way of constructing feature spaces by using deep generative learning instead of manual or classical discrete search methods. The educational component of this project includes developing a new curriculum of data centric AI and provides students from under-represented groups with opportunities to participate in research. This project addresses an important problem: feature space construction learning. The unique perspective is to view feature space construction as a cross-sequence feature-generation task. The project proposes new techniques for feature learning, generalization, and supporting robustness to data imperfections. Specifically: 1) This project proposes a principled deep EOG (embedding-optimization-generation) framework to distill feature knowledge, convert discrete search in feature space into efficient continuous optimization in embedding space, and reduce feature space reconstruction to sequential generation; 2) This project develops generalization strategies to achieve task-agnostic, label-free learning, transferability, and distribution shift awareness in generative feature transformation; 3) This project develops graph topology-aware generation, reinforcement augmentation, variational smoothing, and adversarial robustness to handle complex attributed graphs and weak training data, ensuring data-efficient and robust learning. Finally, this project incorporates the proposed methods into systems for modeling material formula interactions and for composing and reconstructing polymer configuration indicators for screening polymer performance. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →