CIF: Small: Effective bounds for distributed storage and data access
University Of Illinois At Urbana-Champaign, Urbana IL
Investigators
Abstract
Today's data centers rely on advanced methods to efficiently store increasing volumes of data. To protect against loss of data, data is stored redundantly on multiple disks. At the large scale of clusters of thousands of disks, simple replication of data is inefficient and not an option. To address the challenge of efficient, reliable and secure storage of data at a large scale, the project uses various combinations of algebraic and combinatorial methods. New constructions are given for the efficient recovery of data in case of disk failure. New methods are introduced to optimize bounds for storage capacity. New secure schemes are developed to ensure that information can only be obtained from the combined data of multiple disks. The research uses a novel algebraic approach to fundamental aspects of data storage and data access. Undergraduate and graduate students will be involved, working on projects with both a theoretical and a computational component. Algorithms for data storage encode and divide data over several disks. The encoding challenge is to optimize the allocation of storage space between primary data and repair data. The optimization is analyzed for the general case in the setting of entropy inequalities and for the linear case in the setting of rank inequalities for matroids. The main focus is on three aspects. 1) Outer bounds: Optimize the use of available storage space under various combinations of constraints. 2) Multiple-access: Use coordinated encoding of data from different sources to add error-correction without sacrificing storage capacity. 3) Small alphabets: Using a novel approach, analyze nontraditional coset schemes that are defined over smaller fields.
View original record on NSF Award Search →