CIF: Small: Compression Schemes for Communication Constrained Bandit and Reinforcement Learning

$614,000FY2022CSENSF

University Of California-Los Angeles, Los Angeles CA

Investigators

Abstract

Active learning and online learning are machine-learning paradigms in which computers learn to make complex decisions while receiving feedback from an environment. For instance, a drone may learn to fly by itself, or a car may learn to drive by trial and error. Recently, these learning paradigms have been widely applied and have achieved phenomenal successes with human-level performance in tasks like gameplay or robot control. As computing devices become smaller and less power-consuming, new distributed learning frameworks start to emerge. These frameworks contain low-capability learning agents (such as cell phones, unmanned vehicles, or drones) that are far apart but perform learning collectively by communicating with each other through (wireless) networks. However, existing communication approaches would become bottlenecks for learning since they were designed for high-power computers and consume too much power and network bandwidth. This project aims to address this issue by providing novel techniques that efficiently compress data to be communicated while preserving the learning ability. The techniques developed in this project will advance the state-of-the-art in distributed online/active learning by improving communication efficiencies. The overarching goal of this project is to establish efficient compression schemes that support effective active/online learning, such as bandit and reinforcement learning over communication-constrained networks. In these learning environments, a learner aims to make a good decision for the next steps based on experience; this project will explore fundamental bounds and efficient algorithms that support this goal while minimizing the number of bits communicated - by compressing in a way that only retains the necessary information for decision making. In other words, this project aims to explore the fundamental trade-off between compression and learnability in active/online environments. Building on promising preliminary work, the investigators will study problems ranging from the most basic multi-arm bandit setting to more complex reinforcement learning settings and consider both centralized and decentralized network topologies. More specifically, the investigators propose compression schemes and fundamental theoretical bounds for (1) rewards in multi-armed bandit problems, (2) context vectors for contextual bandit problems, and (3) state-action features and models for Markov decision problems. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →