CAREER: Learning-Based Hardware and Software Techniques for Quality-of-Service-Aware Cloud Microservices
Cornell University, Ithaca NY
Investigators
Abstract
Datacenters support a large and ever-increasing fraction of the world's digital computation power, including search engines, social networks, and machine learning analytics. As modern cloud services grow in popularity, their design shifts from supporting complex monolithic applications, to supporting collections of specialized, loosely-coupled microservices. Such microservices impact resource requirements by requiring fast network processing and low-latency memory accesses to achieve their quality-of-service (QoS) constraints. Dependencies among microservices also complicate compute cluster management, and can cause cascading QoS violations, hurting availability and service reliability. Guaranteeing the responsiveness expected from cloud services while using datacenters efficiently requires instead a joint hardware-software approach. This project takes a holistic view towards designing a system stack for interactive cloud microservices running on large-scale datacenters that is QoS-aware, and resource-efficient. By pursuing automated, learning-based techniques, this project highlights the value of leveraging practical machine learning techniques to better navigate the increasing complexity of the cloud, as more datacenter services switch to this new application model. At the hardware level, this project first quantifies the implications microservices have on server design, and second, explores their potential for hardware acceleration. At the software level, this work is developing a new cluster manager that accounts for the dependencies among microservices in an automated and transparent-to-the-user way, and guarantees end-to-end performance. Finally, to eliminate the cascading effects of QoS violations between microservices, this project includes a data-driven, online performance forecasting system. This system leverages the massive amount of monitoring data collected by cloud systems to anticipate upcoming QoS violations, and act on them before they degrade performance. By innovating in both hardware and software, this work will achieve performance and efficiency gains that neither hardware- nor software-only approaches can provide. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →