CRII: NeTS: Scaling Distributed Storage with Programmable Switches
Johns Hopkins University, Baltimore MD
Investigators
Abstract
Modern Internet services, such as search, social networking and e-commerce, depend critically on high performance distributed storage. As these services scale to billions of users, system operators are increasingly relying on in-memory storage to meet the necessary throughput and latency demands. A major challenge in scaling distributed storage is coping with skewed, dynamic workloads, which can lead to severe load imbalance and performance degradations. While traditional flash-based and disk-based storage can be balanced using a fast in-memory caching layer, server-based caching does not work for in-memory storage, because there is little difference in performance between the caching and storage layers. This project investigates a new distributed storage architecture that leverages the power and flexibility of new generation programmable switches to cache data in the network. The project goal is to not only improve the performance and reliability of cloud systems in practice, but also provide new architectural and theoretical insights on classical distributed systems topics. This project will foster the collaboration between the networking and systems communities in the new area of co-designing networks and systems with programmable switches. The project will develop a new architecture to provide high aggregate throughput and low latency even under highly-skewed and rapidly-changing workloads. There are two major technical thrusts. The first one is to design the switch cache with limited switch functionalities and resources. The approach will be to exploit the match-action tables, register arrays, and multi-pipeline, multi-stage structure of modern switch designs to efficiently index and pack variable-length objects into limited switch table and memory resources. The second research thrust is to design the overall system to achieve scalability, consistency and fault-tolerance. While these are classical distributed systems topics, they will be revisited in a new type of heterogeneous system which consists of fast switch caches and general-purpose storage servers. A prototype system will be constructed using both programmable software and hardware switches, and extensively evaluated with a wide range of realistic and synthetic workloads. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →