GGrantIndex
← Search

RINGS: Enabling Data-Driven Innovation for Next-Generation Networks Via Synthetic Data

$1,000,000FY2022CSENSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Abstract

Next-generation networked systems are increasingly data-driven, meaning they are developed, tuned, and tested on real data. For instance, data-driven techniques can enable better quality of experience for content distribution over the Internet, better wireless communication techniques, and better attack detection techniques for emerging cybersecurity threats. Unfortunately, a pervasive lack of data limits the potential of data-driven research and development. Data holders are often reluctant to share datasets for fear of revealing business secrets or running afoul of regulations. These data access challenges will become (and already are) a fundamental stumbling block for innovation in next-generation networks. This project aims to tackle this impasse with synthetic data —-- data that exhibits the same statistical patterns as real data, without the need to explicitly share the original source data. Synthetic datasets can be safely released to enable cross-stakeholder collaboration. Synthetic data generation techniques, however, have classically suffered from poor data quality. This proposal explores how to leverage and extend recent advances in machine learning to use Generative Adversarial Networks (GANs) to generate synthetic models of networking datasets. Realizing the potential benefits of GAN-generated synthetic data for networking systems, however, is challenging on multiple fronts. First, network traffic datasets (e.g., packet captures) entail complex relationships that raise new fidelity and scalability implications for prior GAN models. Second, networking use cases pose new (and traditional) privacy requirements, and the resulting privacy-fidelity tradeoffs remain poorly understood. Finally, several networking use cases entail studying rare or extreme events (e.g., outages, flash crowds, attacks). Data for such extreme events by definition is rare and challenging for GANs (or any synthetic data model) to learn. This project will tackle interdisciplinary challenges spanning networking, machine learning, and privacy to develop novel foundations for GAN-enabled workflows for supporting data-driven operations in next-generation network systems. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →