GGrantIndex
← Search

CAREER: Data Polymorphism: Enabling Fast and Adaptable Scientific Data Retrieval with Progressive Representations

$500,000FY2025CSENSF

University Of Kentucky Research Foundation, Lexington KY

Investigators

Abstract

Scientific simulations and instruments produce an unprecedented amount of data that overwhelms the network and storage systems. Due to the limited capacity in high-end parallel file systems, such data must be stored at remote sites or moved to secondary storage for archival purposes. This poses challenges to fetching the data for post hoc data analytics, as the data movement bandwidth across wide area networks or from secondary systems is very limited. This project bridges this gap by developing scalable software to realize data polymorphism, a novel paradigm that allows for variable representations of the same data under different scenarios and use cases to enable on-demand data provision with reduced data movement cost. The success of this project is expected to significantly reduce the time needed to gain scientific insights from data for a wide range of applications, thus advancing scientific discoveries in domains including climatology, cosmology, fusion energy science, and ptychography. This contributes to resolving a wide range of important societal problems, including weather forecasting, galaxy surveys, electric generation, and material design. Furthermore, an integrated education program is developed for workforce development and broadening participation in advanced cyberinfrastructure. This project aims to leverage progressive representations to realize data polymorphism and enable fast and adaptable scientific retrieval with tailored error control. The contributions are threefold. First, a generic framework is designed to abstract the generation of progressive representations for scientific data to allow for the integration of novel algorithms and flexible tuning methods with improved performance and efficiency. Second, rigorous theories and tailored implementations are developed to enable error control on the outcomes of downstream analysis during data retrieval. This significantly improves the trustability of the reconstructed data representation, as the correctness of such outcomes is of utmost importance in scientific analyses. Third, a data service library is optimized for high performance, portability, and scalability towards the diverse architectures in advanced cyberinfrastructure. Integration with the leading data management and visualization software is also planned to facilitate its use in real applications. To this end, end-to-end evaluations with the applications are anticipated to demonstrate the efficiency of the deliverables by significantly reducing the time needed for scientific discoveries. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →