AitF: FULL: Collaborative Research: Better Hashing for Applications: From Nuts & Bolts to Asymptotics

$250,000FY2015CSENSF

Harvard University, Cambridge MA

Investigators

Abstract

This project engages experts in systems and network algorithms from Carnegie Mellon University and Harvard University to improve hashing-based data structures for systems. Hashing is an approach that turns a variable length string into a small, fixed-length value. Hashing provides a short, consistent fingerprint used to identify larger pieces of data, for uses including storing and locating data items quickly and effectively. Hashing provides a key building block for sophisticated approaches to storing, measuring, and managing data. Hashing-based data structures have correspondingly become widely accepted, often key workhorses throughout systems and networking. This project will create synergies between theory and systems in the area of hashing, with various approaches for lasting broader impact. Prototype code will be released for new algorithms and data structures created in the course of the project. Curricular materials focused on project material will be developed and distributed. The project will offer a wide range of research opportunities at various levels of sophistication for graduate and undergraduate students at both universities. The team unites expertise with theoretical design and analysis with expertise in systems design and analysis, allowing ideas and insights to flow between the two sides. The work starts from the lowest level of what choice of what hash functions to use, through the design and analysis of general data structures, to the development of applications that utilize hashing-based data structures to provide top performance. Project goals include both improving existing structures such as Bloom filters and cuckoo hash tables in practical systems to developing new structures for related problems such as maintaining small structures for fast function evaluation on key sets and reconciliation of datasets.

View original record on NSF Award Search →