III: Small: 3D Graph Neural Networks: Completeness, Efficiency, and Applications

$599,939FY2023CSENSF

Texas A&M Engineering Experiment Station, College Station TX

Investigators

Abstract

Many real-world systems and phenomena can be described by entities and their relations. For example, social networks consist of people and their relationships, and molecules consist of atoms connected by chemical bonds. Graphs are commonly used to encode such systems and phenomena in which nodes correspond to entities, and edges correspond to relations. Computational analysis of graphs has been an active area of research for many years with a plethora of fruitful results and discoveries. However, many current graph analysis studies only consider topologies of graphs (i.e., a two-dimensional representations of these relationships), while important geometric information is not considered. In many scientific domains, physical systems are most accurately described by geometric graphs, also known as 3D graphs, in which each node is associated with a coordinate in 3D physical space. Accurate encoding of such geometric information is critical in many scientific domains. For example, atoms in a molecule occupy physical space, and their locations determine 3D molecular geometry. In drug discovery, the binding properties and thus effectiveness of drugs critically depend on their 3D shapes as molecular interactions act similarly to lock-and-key mechanisms. This project aims at advancing the field of geometric graph analysis by developing algorithms that can capture 3D geometries of graphs accurately and efficiently. The project is committed to broadening participation in computing by engaging and inspiring K-12 and underrepresented students in artificial intelligence and molecular analysis research and education. In this project, the first set of research tasks aim to develop principled 3D graph neural networks that can use the power of 3D geometric information of small molecules to generate informative and discriminative representations. A novel message passing scheme will be developed to incorporate 3D geometric information in a complete and efficient manner. Building on this development, a new 3D graph neural networks architecture will be designed to facilitate representation learning on large-scale molecule data and boost the performance and efficiency for a plethora of real-world tasks. The second set of research tasks extend the proposed complete and efficient 3D graph neural networks to representation learning of proteins, which are complex macromolecules of fundamental importance. Existing studies either fail to consider the hierarchical relations present in proteins or suffer from severe efficiency issues. To overcome these limitations, a novel hierarchical protein graph network to learn protein representations at different levels will be developed in this project. The proposed method faithfully integrates important hierarchical relations, resulting in a more natural protein learning scheme. By employing the proposed complete and efficient 3D graph neural networks for small molecules as a base model, the new hierarchical protein graph network is expected to achieve provable completeness and efficiency at different levels. The proposed research will result in open-source software tools to be used by researchers and practitioners. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →