GGrantIndex
← Search

CISE-MSI: DP: III: Training and Partnership in Data Science for Advancing Research in Biomolecular Detection

$584,970FY2022CSENSF

Delaware State University, Dover DE

Investigators

Abstract

Detecting and identifying biomolecules such as proteins, viruses, carbohydrates, and DNAs are critical to diverse disciplines such as medical diagnosis, forensic analysis, water quality monitoring, food safety, drug delivery, and materials design. Data measured from diverse spectroscopic techniques provide valuable insights into these biomolecules. Molecular compounds -designed or natural- are made of multiple and diverse biomolecules, and are challenging to investigate, including analysis of their convoluted data. For this purpose, data science, which combine statistical and computational methods, can be integrated to not only analyze large complex data but also generate new insights from the data. This project demonstrates the integration of data science and machine learning approaches with spectroscopic measurements on selected biomolecules. The overarching goal is to develop an automatic data-driven model for efficient detection and accurate identification of biomolecules, which can be used for accelerating discovery and designing novel technologies. The project opens a range of opportunities to train students from groups historically underrepresented in STEM fields in interdisciplinary research at the interface between data science, biochemistry, and physics. The project focuses on building initially-spectral databases starting with carbohydrate biomolecules and on designing machine learning models to be trained and tested with multimodal and heterogeneous spectral data of these biomolecules. It has dual objectives: 1) to demonstrate interpretable and trustworthy physics-informed machine-learning models that can improve efficiency and accuracy of biomolecular detection; and 2) to contribute to the foundational questions in knowledge-based machine learning when handling mixed or sparse data, augmenting one-dimensional data in limited databases, and understanding the relational intricacies between noisy data and accuracies of training and testing. This project is built on collaborative efforts between two HBCUs (Delaware State University and the University of Virgin Islands) and the University of Delaware, a research institution. Under a domain-guided data life science framework, the participating students are exposed to a full cycle of data science approaches including collecting data, preprocessing and analyzing the data, designing machine learning models, training and testing the models, validating the models, and providing informed feedbacks about the data and the models. Once demonstrated, the scope of the approach will be extended to cover other biomolecules such as proteins and complex systems. This project is jointly funded by MSI and the Established Program to Stimulate Competitive Research (EPSCoR). This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →