I-Corps: Trustworthy Synthetic Data Generation

$50,000FY2023TIPNSF

University Of California-Los Angeles, Los Angeles CA

Investigators

Abstract

The broader impact/commercial potential of this I-Corps project is to develop software products that support modern privacy enhancing technology. The project's core technology proposes data-rich applications for a new and trustworthy way to maintain data utility and protect data privacy. Using generative artificial intelligence (AI), the proposed technology unlocks greater data impact for small and medium sized businesses and organizations while aligning with modern privacy law. In addition, the software products have potential to target marginalized communities in "digital rights deserts," where clients and customers' digital rights are significantly limited by businesses and organizations capacity, data privacy awareness, and cost. With the proposed software products and accompanying auditing service, more individual businesses and companies may receive more access to their data benefits and care of their customers' digital rights regardless of their demographic and socio-economic background. This I-Corps project is based on the development of deep learning technology for tabular data synthesis. The project leverages the use of generative adversarial networks for trustworthy tabular data synthesis. The interdisciplinary academic-industrial collaboration experience provides a proven framework to integrate data synthesis technology into modern machine learning workflow to support and develop modern digital businesses and services. Pilot research was used to develop an integration of artificial intelligence algorithms towards audit quality and trustworthiness of synthesized tabular data to unlock safe, secure, cross sector data sharing. In addition, research has been performed with industrial partners in the social media platform sector to assess and polish the technology towards providing users privacy-preserving metrics and exercise evaluation through industrial-level machine learning pipelines. The proposed technology has the potential to positively change platform users' digital rights by significantly enhancing safety, security and anonymity, advancing digital law enforcement, and increasing data benefits for both digital service providers and users. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →