GGrantIndex
← Search

III: Small: Scalable Analytics for Data Bases and Data Streams--a Unified Approach

$499,011FY2012CSENSF

University Of California-Los Angeles, Los Angeles CA

Investigators

Abstract

The goal of this project is to support the development of big-data analytics and facilitate the deployment of such applications over the different execution environments encountered in the life-cycle of big-data software. The project achieves its goal by (i) designing and implementating a Scalable Analytics Language (SAL) that supports the definition of advanced analytics through user-defined aggregate functions; (ii) developing a compiler for SAL that optimizes parallel MapReduce-oriented executions over distributed systems containing many nodes, and (iii) developing an early accurate result library (EARL) that enhances (ii) with the ability of providing approximate results based on sampling of the data. Using EARL, the analyst can avoid the slow response and long set up-time of MapReduce applications by simply specifying a target accuracy, with no modification of the original program required. The final delivery of the project is a SAL compiler that optimizes parallel execution of continuous analytical queries on massive data streams, by supporting the synoptic and load-shedding primitives needed to achieve quasi real-time response in this environment. These research results are expected to have great impacts in several areas, including domain science, digital government and e-commerce. The project supports Ph.D. students pursuing research on big-data analytics and their management. Publications, technical reports, software, and experimental data from this research will be disseminated via the project web site (http://yellowstone.cs.ucla.edu/nsf-projects/nsf1218471.html).

View original record on NSF Award Search →