CAREER: SHF: Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems

$674,920FY2023CSENSF

University Of Wisconsin-Madison, Madison WI

Investigators

Abstract

Modern monolithic computing systems ranging from smartphones to supercomputers contain a heterogeneous mix of conventional CPUs and a variety of specialized accelerators each tuned to run specific subsets of applications at high efficiency. Sadly, the underlying technology is changing, making continued scaling difficult. Thus, recent work has examined combining multiple smaller chips (chiplets) into a larger, aggregated system. Chiplet-based heterogeneous systems avoid the challenges of modern, large monolithic heterogeneous systems, enable continued performance and energy scaling, and allow closer integration of components than was previously possible. Unfortunately, chiplets also introduce new challenges: how to schedule computation across the computational resources and how to coordinate data movement between resources. The control plane in modern, monolithic heterogeneous systems utilizes a centralized embedded, programmable core, a command processor (CP), to perform these tasks. However, chiplet-based heterogeneous systems introduce an additional layer of hierarchy, causing indirection and non-uniformity that clash with the centralized CP. This project will fix these problems and advance science by creating a distributed, fully featured, programmable, programmer-transparent control plane that will significantly improve the efficiency of future chiplet-based heterogeneous systems. Moreover, this project will open new areas of research and enable chip designers to efficiently design and integrate future chiplet-based heterogeneous systems. To expand the community and reduce barriers to entry, the project will (1) release the developed software artifacts, hardware designs, and interfaces, (2) work with industry collaborators to influence subsequent products, and (3) develop new courses to teach the next generation of students how to use and accelerate chiplet-based heterogeneous systems. To realize these opportunities, this project partitions the centralized CP into local, per-chiplet CPs which, in concert with a global CP, coordinate communication and computation across accelerators. The local CPs provide dynamic, microsecond-scale information about the current behavior within each chiplet, while global CP possess a global view across chiplets by synthesizing information from the local CPs. Given this partitioned design, the project’s key tasks include improving accelerator utilization by identifying and harnessing algorithmic parallelism in the global CP, creating novel local and global CP schedulers that transparently and efficiently divide work across chiplets to meet real-time deadlines and retain data locality and placement benefits, and designing an innovative coherence protocol that monitors and tracks dependence information in the local and global CPs, and performs expensive inter-chiplet implicit synchronization operations only when necessary. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →