CAREER: An Integrated Treatment of Voltage Noise and Process Variability in Many-core and GPU Systems with Microarchitectural Solutions
Ohio State University, The, Columbus OH
Investigators
Abstract
Advances in microprocessor technology have been the main engines for growth in the computing industry for decades. Microprocessor performance has improved at a remarkable rate for many years owing to technological and microarchitectural innovations. Unfortunately technology scaling has reached an impasse in recent years. Significant challenges with the chip manufacturing process in low-nanometer dimensions are causing high variability in transistor behavior leading to lower performance, higher power consumption and higher susceptibility to errors. In general, high variability leads to larger design margins making chips less energy efficient. These technological challenges are happening at a time when the need for energy efficient computing is greatest. With the rapid proliferation of cloud-based computing and the explosion of smart mobile devices energy efficiency is now crucial to the entire range of computing markets from servers to smartphones. Achieving continued performance growth in these systems going forward requires dramatic improvements in the energy efficiency of computation. In this work, new microarchitectural and software solutions for lowering design margins in future chips are being developed, achieving substantial energy reduction while ensuring reliable operation. These solutions employ variation-aware design across multiple layers of the computing environment including microarchitectural innovations, new firmware and operating system-based scheduling and power management solutions. New technology models that for the first time integrate multiple sources of variability are being developed for this purpose, enabling the design of variation-aware solutions. The work identifies new reliability challenges that affect supply voltage stability in chips with large numbers of compute units such as many-core and graphics processors. It develops novel process and voltage variation-aware scheduling and power management algorithms that reduce voltage instability eliminating the need for large and inefficient design margins. At the microarchitectural and firmware levels the work develops new mechanisms for dynamically reducing voltage margins by leveraging on-chip resiliency mechanisms to ensure reliable and efficient execution. These solutions are dramatically improving the energy efficiency of computation and are therefore expected to have a significant impact on the computing industry and society in general.
View original record on NSF Award Search →