SPX: Collaborative Research: Rethinking Data Center Abstractions Utilizing Warehouse-Scale Shared Memory
Princeton University, Princeton NJ
Investigators
Abstract
Warehouse-scale computers (or data centers) are essential for the technology that people rely on every day: from making purchases, to hailing rides, to sharing experiences with friends. Programming warehouse-scale computers requires specialized software that coordinates the many individual computers that make up the data center, while ensuring that the system will continue to operate if any machine fails. Unfortunately, this data center software has fundamental differences from that developed for single computers (i.e., that is taught to most computer science students) resulting in long development times and poor performance. The proposed work will bridge the gap between the software systems used in current data centers and what is available to most programmers (and computing students). The proposed work will allow individual computers within a data center to communicate through "shared memory"---the same mechanism used within small-scale computers from phones to laptops to individual servers. The project has the potential to make warehouse scale computing much more accessible to everyone. In particular, it will allow an easy transition for software that runs on common machines (laptops, desktops) to the datacenter. Additionally, the project will create many educational opportunities through enhanced classroom projects and creation of research opportunities for undergraduates. The project's distinguishing feature is a holistic design of new computing hardware and operating systems to allow this "shared memory" abstraction to provide both the scale and failure tolerance of specialized data center software. While prior hardware takes an approach of "share all" or "share nothing", the proposed hardware will allow subsets of data to be shared across subsets of hardware. Then, the operating system will be extended to automatically manage the access to this shared memory, so that programmers do not need to be aware of the difference. By coordinating software and hardware management, the project will overcome prior scalability and failure tolerance challenges of sharing memory. It will also allow easy sharing of datacenter resources, preventing fragmentation and reducing the cost of using datacenters and cloud computing. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →