SHF: Medium:DILSE: Codesigning Decentralized Incremental Learning System via Streaming Data Summarization on Edge

$614,950FY2022CSENSF

William Marsh Rice University, Houston TX

Investigators

Yingyan C Lincontact Anshumali Shrivastava Cesar A Uribe

Abstract

Various emerging fields, such as connected autonomous vehicles and smart home analytics, have ushered in an era of Artificial Intelligence (AI) on Internet of Things devices. However, with the proliferation of modern edge devices characterized by limited storage, heterogeneous capabilities, dynamic network connection, and growing concerns of data privacy, it will become impractical to scale and update the current mainstream centralized machine learning (ML) models, leading to large latency delays, energy dissipation, and potentially outdated models with degraded performance. Hence, it has become paramount to efficiently process the inherently decentralized data streams on-device, i.e., closest to their sources without sharing and accumulating the raw training samples on a centralized server. While Federated learning (FL) has emerged to bring ML models, its global model updated at the server by aggregating local models can lead to poor model convergence and requires compromises between model accuracy and available resources on heterogeneous resource-constrained edge devices. Moreover, FL has not yet been designed for handling streaming data generated on the edge. This project aims to open up a new paradigm for developing powerful ML models by combining the best of both worlds of centralized ML and decentralized FL, and thus push forward the frontier of unleashing the great promise of AI to transform human life. The outcomes from this project will lead to new course materials spanning several areas of ML (e.g., decentralized optimization, computer architecture, and edge computing systems) and open-education resources that aim to attract diverse groups of students and eventually deliver a platform for inclusion and innovation. This project is to bridge Centralized ML and decentralized FL, considering the salient streaming and statistical characteristics of data combined with widespread device and network heterogeneity. The key contributions are to develop rigorous foundations for the new decentralized ML training setup in: (1) creating new algorithms for on-device summarization of streaming data to reduce memory cost and improve the processing latency, while preserving privacy; (2) performing decentralized optimization and communication under the heterogeneity of devices in a communication network; and (3) co-designing energy-efficient hardware architecture and algorithm for accelerating streaming data summarization with real-time inference, and developing decentralized incremental learning via streaming data summarization on the edge on a network of real heterogeneous edge devices for system evaluation, validation, and demonstration while promoting green AI. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →