CIF: Small: Distributed Statistical Inference with Compressed Data

$449,996FY2017CSENSF

University Of California-Davis, Davis CA

Investigators

Abstract

Due to the rapid growth of size and scale of datasets and desire to harnessing parallel processing capabilities of multiple machines, distributed statistical inference and machine learning, in which available data are stored in multiple machines who are allowed to communicate with each other with limited communication budgets, have attracted significant research interests. There are two basic scenarios for the distributed setting: sample partition and feature partition. Although there have been many recent work on the design of inference algorithms for the sample partition scenario, there has been limited work on the feature partition scenario. The focus of this project is to characterize the fundamental limits and develop distributed statistical algorithms for the feature partition scenario from information theoretic perspective. Compared with the sample partition scenario, the feature partition scenario is significantly more challenging. This research addresses these challenges by focusing on two research thrusts. Thrust 1 focuses on designing interactive encoding schemes for inference. The main idea is that, by interacting with each other, the terminals can coordinate their compression so that the decision maker can obtain more information about the parameter while using the same communication resources, which will lead to a better inference performance. Thrust 2 designs function computing schemes for inference, in which the machines compute a function of observations without recovering them first and then perform inference from this function. The main motivation for this idea is that recovering observations or a compressed version of them is not necessary in the distributed inference setup, as the final goal of the distributed inference is to infer the value of the unknown parameter.

View original record on NSF Award Search →