SLES: SPECSRL: Specification-guided Perception-enabled Conformal Safe Reinforcement Learning

$1,550,000FY2023CSENSF

University Of Pennsylvania, Philadelphia PA

Investigators

Rajeev Alurcontact Dinesh Jayaraman Eric Wong Osbert Bastani

Abstract

Machine learning techniques such as reinforcement learning (RL) promise to be the key enabling technology for an age of autonomous home robots, but to fulfill this promise, safety considerations must be integral to their design. In addition to avoiding immediate injury to humans or to the robot, household tasks carry additional hazards with varying levels of risk, such as spilling a hot drink, leaving a faucet on, or crushing a fruit. The SpecsRL project advocates the use of formal high-level logical specifications for design and deployment of safe RL algorithms. Such specifications can precisely and succinctly convey both the desired goals that a robot is required to accomplish as well as the harm it is expected to avoid. Learning algorithms for policy synthesis can then be designed with quantifiable mathematical guarantees for how well the synthesized policy meets the specification. To realize this agenda of specification-guided RL with precise mathematical and empirical safety guarantees, SpecsRL brings together researchers with expertise in reinforcement learning, formal methods, theory of machine learning, and robotics. The primary contribution of SpecsRL is a novel framework for RL with associated task specification language, learning algorithms, theoretical and empirical techniques for guaranteeing and evaluating safety, and case studies. SpecsRL research is organized along five thrusts. (A) Specifying safety: The project develops a highly flexible temporal-logic-based specification language suitable for robotic tasks with reachability goals and both hard and soft safety constraints. A key novelty is that logical state predicates (such as ``boiling water'') are grounded in visual perception. (B) Conformally safe policy synthesis: The project develops methods to propagate conformal prediction-based uncertainties associated with visual predicates and policy execution, enabling derivation of end-to-end formal safety guarantees for the overall system. (C) Online interventions for safe recovery: To deal with scenarios with high uncertainty during task execution, the project develops online monitoring and verification techniques that permit the robot to ``check its work'' through interactive behaviors, such as skewering a potato to check whether it is done, to permit robust decision-making and recovery. (D) Empirical safety testing: Since a robot is likely to encounter scenarios that are not considered during training, the project develops empirical stress testing techniques to discover potential failure modes, and robust ways to adjust to distribution shifts. (E) Experimental evaluation: The approach is evaluated for a kitchen robot in simulation as well as in experimental robot kitchen facility for complex tasks such as cooking pasta and setting a dinner table. The education and knowledge transfer activities are to build a community of researchers at the intersection of RL and safe AI, to facilitate tight integration of research results with education, and to promote diversity. This research is supported by a partnership between the National Science Foundation and Open Philanthropy. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →