CSR: NeTS: Small: Theoretical Foundations for Cache Networks: Performance Models, Algorithms, and Applications
Ohio State University, The, Columbus OH
Investigators
Abstract
Caching systems are a core component of Internet data infrastructures. They enable low-cost access to a fast, but limited cache space that stores a selective subset of popular data items drawn from a large collection of data records that are stored in slow, persistent media. Caching systems greatly improve the performance of various services in, for example, information retrieval, data analytics, social networks and e-commence. Caching systems are already widely deployed but need to scale efficiently to support emerging big data applications. To better serve multiple flows of data requests on a caching system, a fundamental question is whether the cache space should be pooled together to serve these flows jointly or be divided to serve them separately. While system-based approaches have yielded good intuition and first-order solutions, it is important that new theories be developed to provide a quantitative characterization of cache networks that can lead to optimal or near-optimal solutions. This project will develop a much-needed theoretical foundation for cache systems in emerging data processing systems, with concrete plans to transition the theoretical results into practical implementations. The research will be carried across the following interrelated thrusts. (1) Characterizing miss ratios of competing flows on least-recently-used (LRU) caching: A unified theoretical framework will be developed to investigate critical factors that impact the cache miss ratios, including request rates, data popularities, item sizes, and overlapped data items across different request flows. The new insights will be used to directly improve the performance of real caching systems. (2) Optimizing data caching for server clusters: this thrust investigates whether servers should be pooled together or not, and how to optimize the sizes of different server clusters as well as where to route data requests for multiple caching clusters. A joint optimization of data caching and job scheduling will also be addressed. (3) Transition from theories into practice: This thrust will leverage open source projects, Memcached, Redis, Hadoop, Spark and Tachyon, to validate the theoretical results by real experiments, and to transition theories into working systems by adding new modules and modifying existing code. This project may benefit society, industry and academia. The insights from the mathematical analysis and the new algorithms developed through this project are expected to play a key role in improving the performance of cache networks, for example, for in-memory key-value stores. They can contribute to both theoretical research and practical technologies. A specific focus will bridge the traditional separation between stochastic operations research and computer engineering education. The research results from this proposal will be integrated into a new graduate-level course. Other broader impacts include industry collaborations for practical use cases and technology transfer, undergraduate summer programs, strategies for engaging women and other under-represented groups, and the development of a strong research lab so that it is also a teaching lab.
View original record on NSF Award Search →