III: Small: Increase the Throughput of Non-Relational Databases through Theoretical Modeling and Optimization
University Of California-Riverside, Riverside CA
Investigators
Abstract
The explosive growth of data is driving the rapid evolution of massive data-storage systems. These systems are widely used, not only in large-scale Internet services, but also in scientific projects in diverse areas such as astronomy, geography, and genetics. This project will increase the efficiency of these data-storage systems, which will allow processing more data at lower cost. There is the potential for a large societal impact as science and engineering research is made more cost-effective. More specifically, this project will work on improving non-relational databases with log-structured merge-tree storage architectures. One main focus will be on improving a key component of such systems, namely, compaction policies. Compaction policies are not yet well understood, but are crucial for system performance. To date, compaction policies have been designed by trial and error, guided mainly by empirical experience. The project will develop analytical models for compaction, validate and refine the models with empirical testing, design improved policies that are optimal according to the models, and deploy these policies in live systems. Further, the developed theoretical models will be leveraged to optimize non-relational database systems in handling high volumes of dynamic continuous queries, which arrive and expire rapidly.
View original record on NSF Award Search →