GGrantIndex
← Search

FRG: Collaborative Research: Generative Learning on Unstructured Data with Applications to Natural Language Processing and Hyperlink Prediction

$250,000FY2020MPSNSF

Stanford University, Stanford CA

Investigators

Abstract

This project addresses the pressing needs of analyzing “big” unstructured data and tackles some artificial intelligence questions from the statistical perspective, which requires the focused and synergistic efforts of a collaborative team. Specifically, the project develops generative models for statistical learning and leverages dependence relations modeled by graphical models in hyperlink prediction, which are applicable to topic sentence generation and protein structure identification. It will lead to a substantial improvement in the accuracy of generative learning based on numerical embeddings, particularly in topic sentence generation and hyperlink prediction. The integrated program of research and education will have significant impacts on machine learning and data science, social and political sciences, and biomedical and genomic research, among others. The project requires extensive algorithm and software development for natural language processing and multimedia data integration. The PIs, their postdocs, and students will develop innovative computational algorithms and software for the analysis of large-scale unstructured complex data. The advanced computational tools will be disseminated to facilitate technology transfer. The project will address some fundamental issues in two important areas of unstructured data analysis in machine learning and intelligence. In particular, the proposed research will develop a statistical framework for generative learning, which is primarily motivated by applications for unstructured data, namely topic sentence generation and high-order hyperlink prediction. The research will develop powerful generative methods for generating instances or examples to describe and interpret the corresponding learning model. Moreover, it will develop network models for modeling high-order interactions and relations of units by identifying hidden structures in networks. It will proceed in two areas: (1) instance generation and topic sentence generation; (2) hyperlink prediction for multiway relations in hypergraphs. In the first area, instance generation, particularly sentence generation, will be performed collaboratively with numerical embeddings in categorization and regression. In the second area, hyperlinks will be predicted based on observed pairwise as well as unobserved high-order relations, characterized by graphical models with hidden structures. Special effort will be devoted to inverse learning, the integration of data from multiple sources, and extracting latent structures of networks. Finally, the research will develop computational tools and design practical methods that have desirable statistical properties. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →