CIF: Small: Adversarially Robust Reinforcement Learning: Attack, Defense, and Analysis
University Of California-Davis, Davis CA
Investigators
Abstract
In order to develop trustworthy machine-learning systems, it is essential to understand the potential vulnerabilities of existing learning algorithms and then develop corresponding mitigation strategies. Reinforcement learning (RL), a framework for control-theoretic problems that makes decisions over time within uncertain environments, has many applications in a variety of scenarios, such as recommendation systems, autonomous driving, and finance and business management, to name a few. In modern industry-scale applications of RL models, action decisions, reward- and state-signal collection, and policy iterations are normally implemented in distributed networks. When data packets containing reward signals and action decisions are transmitted through the network, an attacker can intercept and modify these packets to implement adversarial attacks. As RL models are being increasingly deployed in safety-critical and security-related applications, there is a pressing need to understand the effects of potential adversarial attacks on these applications. In this project, the investigator aims to address the following questions: 1) Should decisions made by RL agents be trusted?; 2) Can an adversary mislead RL agents?; and 3) How to design RL algorithms that are robust to adversarial attacks? While many existing works address adversarial attacks on supervised learning models, the understandings of vulnerabilities of RL models and their corresponding mitigation strategies are less complete, partially due to the significant differences between online RL and supervised learning. In particular, compared with the supervised-learning setting, the design and analysis of attack/defense mechanisms for RL models have to handle challenges such as long-term rewards, no access to future data, and unknown dynamics. The goal of this project is to overcome these challenges and make initial attempts to answer the questions raised above. In particular, this project aims to: 1) systematically investigate potential vulnerabilities of RL models and algorithms, 2) develop robust RL algorithms that can mitigate the impacts of adversarial attacks, and 3) analyze the benefit/cost of these mitigation strategies. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →