Data-efficient Safe Control with Recovery-to-Optimality Guarantees

$400,000FY2023ENGNSF

Michigan State University, East Lansing MI

Investigators

Bahare Kiumarsicontact Hamidreza Modares

Abstract

As rapid developments on learning-enabled systems in recent years have been advancing autonomy capabilities of systems, their safety certification becomes exceedingly important. While the recent progress on safe reinforcement learning (RL) algorithms for autonomous control design has been promising, these algorithms are accountable only in stable environments and under the availability of comprehensive and high-quality data sets. However, many systems must operate in unpredictable environments under which dangerous divergence might arise between safety and performance. In these environments, adaptation of safety and performance specifications to the context is required. Besides, RL agent must perform learning under realistic data quantity and quality. Current RL practice assumes availability of rich and high-quality data with full observability of the entire system’s states. These assumptions can be violated in many practical systems. This award supports research to create low-complexity safe learning-enabled algorithms for partially observable systems that are equipped with highly-efficient conflict management mechanisms to deliver as much performance as possible safely. Advances will have broad implications in applications of autonomous systems, robots, manufacturing, smart grids, and more. This research project aims to develop low-complexity, safe learning-enabled algorithms for partially observable systems equipped with highly efficient conflict management mechanisms. The objectives of this project are two-fold: 1) Proposing direct data-driven learning approaches for backup safe control policies in partially observable nonlinear systems with uncertain dynamics. The utilization of concepts such as L-extra sample dynamics, probabilistic contractivity, and convex lifting will enable the learning of safe control policies for nonlinear systems with nonconvex safe sets using only measured noisy input-output data. 2) Introducing novel merging approaches to proactively manage conflicts by merging learned backup safe control policies with learning-enabled control policies. Instead of providing reactive quick fixes to conflicts as they arise, these approaches will enable proactive conflict management to avoid destructive future conflicts. Towards conflict management, the level sets of the RL agent will be adapted to the situation to make the agent align with the safety constraint. That is, safety-shaped value functions will be learned to effectively resolve conflicts by considering safety and optimality concerns across the relevant domains. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →