AF: Small: Data Synchronization : Theory, Algorithms, and Practice

$399,370FY2013CSENSF

Harvard University, Cambridge MA

Investigators

Abstract

This project addresses the growing need for improved synchronization methods for large-scale, multi-party distributed systems. As users migrate information to cloud storage, cloud providers may use multiple loosely consistent replicas because of the overhead of keeping replicas synchronized at all times. Further, users themselves may create loosely synchronized replicas on laptops, tablets, or other devices when they are disconnected from the cloud storage. Periodic synchronization, or reconciliation, becomes a requirement in such settings. Reconciliation has been most studied in the specific setting of two connected users, each containing a set of keys, and both desiring the union of the sets. The costs of these protocols is typically in terms of the size of the symmetric set difference between the two sets. The Principal Investigator (PI) will focus on generalizing reconciliation methods to settings where many parties communicate over a network represented by a graph. The new framework will aim to encompass standard problems such as rumor spreading and network coding, as well as generalize to other objects such as sequences with other measures such as edit distance. The primary theoretical and practical challenge the PI will pursue is to develop schemes where the amount of communication necessary for object synchronization depends only on the size of the difference among objects that need to be synchronized, and not the sizes of the objects themselves. For example, for large databases, the synchronization cost should depend on the delta between the databases, which will generally be much smaller than the databases themselves. A further goal is that the framework will have practical consequences for modern cloud-based deployments, especially large-scale big data systems. Because the goals of reconciliation include both algorithmic efficiency as well as communication efficiency, the PI will work to bring ideas from both the theoretical computer science and information theory communities together in this work. The new algorithms and data structures developed for these problems are expected to have significant additional uses for other problems as well.

View original record on NSF Award Search →