CSR: Small: Collaborative Research: System Research on Persistent High-Dimensional Data Access and Its Application to Semiclassical Molecular Dynamics Simulation
Texas Tech University, Lubbock TX
Investigators
Abstract
High dimensional data arise in many scientific studies. High-frequency accesses of large disk-resident high dimensional datasets pose both I/O and algorithmic challenges for high access efficiency. Investigators of this proposal are faced with high dimensional data from semiclassical molecular dynamics studies that need to be accessed repeatedly at a very high frequency during the simulation, and in some simulation studies the datasets can be far larger than the memory capacity. The project addresses the challenge of large high-dimensional datasets by proposing a middleware system for fast access of storage-resident high dimensional data, with easy-to-use APIs for data organizing, indexing, search, and I/O latency hiding. In order to enable fast searches with low memory footprint, various indexing structures are explored and a new locality preserving tree (LPT) is proposed. Furthermore, the project exploits specific data access patterns in dynamics studies for I/O latency hiding through overlapping of computations with I/O's, enabled by a small set of easy-to-use programming constructs for non-blocking read/write operations that are under development for further reduction of data access time. The developed non-blocking APIs are compatible with hard disk drives (HDDs) and hybrid solid state drives (SSDs) and HDDs. In addition, an SSD-based I/O accelerator is developed and integrated with non-blocking APIs. This project tackles important and challenging problems for accessing large high-dimensional data sets, which has applications beyond semiclassical molecular dynamics simulations. The investigators will leverage the research efforts to motivate and educate students. To broaden the impact beyond project participants, courses will be enhanced with knowledge, insights, and materials obtained from the research, with careful adaptation as course materials and course projects.
View original record on NSF Award Search →