Collaborative Research: SHF: Medium: HERMES: On-Device Distributed Machine Learning via Model-Hardware Co-Design

$564,000FY2021CSENSF

University Of Texas At Austin, Austin TX

Investigators

Abstract

Machine Learning (ML) is poised to become the most disruptive technology in modern society by changing all aspects of how humans interact with each other or with the world around them. To be effective, ML models must use vast amounts of data and must be built and updated efficiently wherever and whenever new data, devices, or users are available. To satisfy consumer needs or stringent device or environmental constraints, ML systems must respond fast and use minimum energy whenever possible, especially in the context of widely spread Internet-of-Things (IoT) devices. This project addresses this need by developing new approaches for distributed training that allows for fast and energy efficient training in the field, directly on IoT devices. The results of this project are poised to directly impact a wide array of applications, ranging from human mobility tracking and prediction, to real-time speech or language processing. Furthermore, the project aims to change how engineers are trained in a multidisciplinary fashion for dealing with the problem of efficiently designing distributed ML systems that respond in real-time and with low energy cost to availability of data, devices, or users. The project aims to develop a body of diverse research trainees, while expanding outreach to high-school and middle-school student populations. Given the unified interdisciplinary aspects of this work, its workforce development plan, and its industrial impact, this project enables wide collaboration among emerging or established engineers and industrial partners. Most training of ML models is done centrally in the cloud, thereby not satisfying user privacy concerns or response times, and becoming inapplicable if fast model updates are needed. While efficient on-device inference has been an intense focus of recent research, on-device distributed training and inference have not been addressed from response time and energy efficiency perspectives; this is particularly important for IoT, where the network plays a major part both in training and inference efficiency. To address these challenges, this project (dubbed HERMES) provides a unified multipronged approach for meeting real-time and energy constraints in an on-device distributed setting. HERMES ensures that ML methods and underlying hardware are co-designed, thereby addressing current challenges of private data sharing, communication overhead, or real-time and energy-efficient response of distributed ML. More specifically, Hermes includes: (i) a set of scalable approaches for hardware-aware real-time, energy efficient distributed training based on federated learning and distributed optimization that is robust to data and device variability; (ii) the co-design of ML model and hardware, comprising hyperparameter optimization that exploits hardware characteristics and identifies constraint-satisfying ML models, and hardware design exploration that efficiently finds constraint satisfying architectures; and (iii) an analysis and prototyping infrastructure for demonstrating the benefits of resulting ML systems. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →