GGrantIndex
← Search

NeTS: Medium: Streaming Data Analytics over Programmable Datacenter Networks

$1,200,000FY2018CSENSF

William Marsh Rice University, Houston TX

Investigators

Abstract

Today's datacenters play many important roles, not the least of which is to support big data analytics. Streaming data, in particular, is a prevalent workload that comes from web clickstreams and applications like financial transactions. Streaming data analytics presents unique challenges, as the data arrives continuously at very high speeds, which makes deriving real-time insight about it difficult. However, datacenter networks are becoming more programmable, meaning that more of the common components that a data stream might pass through now have the flexibility to do real-time processing. In addition, new technologies are enabling the datacenter network topology to be dynamically configured to connect the components and servers needed to process a certain stream more efficiently. This project seeks to build a framework and tools that exploit this emerging programmability to enable real time data processing. It has the potential to drastically improve the scalability, performance and energy efficiency of data analytics, and to deliver critical insight and response in real time. This project will develop a new data analytics framework designed to transform the way of performing streaming data analytics by leveraging datacenter programmability. The set of programmable elements has expanded to include not only servers, but network interface components, field-programmable gate arrays, application-specific integrated circuits, and network topology. The vision of the project is to jointly optimize all components to achieve a "sweetspot" in the design space for each application. The researchers aim to develop the scientific foundations and practical techniques to realize this vision through the following activities: identifying key abstractions that programmable datacenter networks can provide to support application-level data analytics, designing a practical streaming data analytics framework to leverage these abstractions, and developing scheduling algorithms to multiplex the programmable resources across different applications. Finally, the investigators will explore efficient and reusable implementations of the framework, which will be applied to real-world workloads as case studies. The project includes collaboration with industry practitioners to enable research ideas and system prototypes to be smoothly transitioned into practice. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →