CSR: Small: Replication in the Cloud Era
Regents Of The University Of Michigan - Ann Arbor, Ann Arbor MI
Investigators
Abstract
Modern life depends on computers, but 24/7 availability that people expect is hard to achieve. Computers are far from infallible and frequently crash. To prevent such failures from disrupting essential services, the research community has spent the last 40 years developing fault-tolerance techniques, which allow services to continue uninterrupted despite failures. These techniques have served us well and are the main reason behind the seamless services we enjoy today. For all their merits, however, our fault-tolerance techniques have a fundamental design flaw. They were designed for standalone services: services which execute on a single machine and do not need to interact with other services. However, such standalone services are becoming increasingly uncommon in today's computing, where large systems consist of multiple interacting components. In this brave new world, our existing fault-tolerance techniques no longer work. This research aims to improve fault-tolerance for such interacting services and has three main goals. (1) Establish a framework to simplify the interactions between services. (2) Restore correctness, by rethinking how we employ advanced techniques, like speculative execution. (3) Focus on optimizing performance by investigating ways to implement such interactions efficiently. Achieving these three goals will enable practical implementations of fault-tolerance in today's large-scale systems, which is the key to ensuring that we can continue to enjoy seamless services in the future. Now that our systems are too large to be implemented as a single service, fault-tolerant techniques must adapt to avoid becoming obsolete. This project aims to steer future academic efforts on fault tolerance away from standalone services and towards a more practical setting. In doing so, it aims to strengthen the ties between academia and industry by having them strive towards this new, common goal: replication in the cloud era. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →