GGrantIndex
← Search

SHF: Small: Collaborative Research: Resilient Computing Systems Using Deep Learning Techniques

$234,959FY2015CSENSF

Sri International, Menlo Park CA

Investigators

Abstract

Over the past decade, computer systems have become prone to a variety of hardware failures. Traditionally, hardware failures were circumvented by operating the system at less than peak computing efficiency, effectively compromising efficiency to achieve reliability. Such a conservative approach is no longer a viable option because it leads to significant energy inefficiency. Since datacenters containing thousands of computers are one of the largest and fastest growing consumers of electricity, it is important to decouple the relationship between hardware failures and energy efficiency. The PIs' research will lay the groundwork for an intelligent computing system that operates at peak efficiency, but manages its fault resiliency and reliability using machine-learning based deep learning techniques. In effect, the system learns to steer itself clear of danger whenever its deep neural nets anticipate a failure. The research will address several important issues involving the scalability, flexibility and efficiency of deep learning techniques for various types of hardware failures. If successful, the research product will minimize, if not eliminate, penalties to the system that stem from the various circuit and micro-architectural techniques that are commonly used to mitigate and overcome hardware failures.

View original record on NSF Award Search →
SHF: Small: Collaborative Research: Resilient Computing Systems Using Deep Learning Techniques · GrantIndex