CIF:Medium:Collaborative Research: Foundations of Coding for Modern Distributed Computing
University Of Southern California, Los Angeles CA
Investigators
Abstract
Coding and information theory provide a very rich body of knowledge from theory to concepts to constructions for creating, leveraging, and removing ?redundancy" in ways that have revolutionized the digital era. This project brings these concepts and techniques to bear in a new field: large-scale distributed computing. The modern paradigm for large-scale distributed computing systems is driven by "scaling out" of computations across clusters consisting of as many as tens or hundreds of thousands of machines. As such, there is an abundance of resource redundancy that can be exploited. This project develops a foundation for "coded computing", a new framework that combines coding with distributed computing to overcome several fundamental challenges limiting the performance of today's large-scale distributed computing platforms. The research outcomes of the project will be integrated into education and will be disseminated broadly. This project takes a principled and foundational approach to providing a unified coding framework to tackle three key challenges in large-scale distributed computing: significant delays due to straggling nodes; large communication loads between computing nodes; and massive input data-sets. In particular, three novel coding concepts are proposed for distributed computing: coding for injecting computation redundancy to mitigate straggler issues; coding to trade local computation with global communication; and coding for statistically principled data sketching. The unified role of codes in both doing fast sketching and in providing robustness to straggler node delays and communication bottlenecks is also studied.
View original record on NSF Award Search →