EAGER: Exploratory Evaluation: Scalability and Effectiveness of Data-Intensive Table-based Computing Software Systems
Carnegie Mellon University, Pittsburgh PA
Investigators
Abstract
Science and analysis today are increasingly tackled by systematic exploration of high-resolution captured, or simulated, data. With a more expansive sample of raw data, more detailed models and precise questions of the meaning of the data can be formed. However, massively parallel systems for processing massive data sets render traditional programming, storage and fault tolerance strategies ineffective. Table-based or column-oriented distributed data storage systems are being developed to support such large scale data analysis, led by Google?s BigTable and including open source variations such as Apache Hbase. These new systems have the flavor of database row and column organization, but have simpler semantics, weaker isolation, and non-SQL interfaces, for example. The effectiveness of these new systems for applications other than internet search support is not well understood. This exploratory project is developing an evaluation framework and exploring a set of these new table-based storage systems, with the goal of capturing an understanding of the state of the art, how they perform and scale, and their reliability and usability. In addition to benchmarks focussing on key metrics, the project's evaluation framework includes real world applications drawn from machine learning approaches to understanding streams of events, such as internet blog publications, and approaches to understanding complex interrelationships, such as social networking graphs, in order to extract insight about the requirements needed to enable these emerging types of knowledge discovery applications.
View original record on NSF Award Search →