ITR: Efficient Algorithms with Implicit Input Data

$150,000FY2003CSENSF

New Jersey Institute Of Technology, Newark NJ

Investigators

Abstract

The main objective of this research is to establish methods for theoretical study of algorithms for which the access to the input data is provided in an implicit way. Such models of computations arise naturally in the context of algorithms working with large data sets. The implicit input may correspond either to a partial data in situations when the entire data cannot be directly accessed (for example, because it is prohibitively expensive), or it may correspond to compressed or coded data. It happens when only such data is stored in the system and no direct access to the real data is provided or is available. Traditionally, complexity theory deals with the classical definition of the explicitly given input data: the input is given explicitly and the generally accepted standard is to search for algorithms having the complexity linear, or at least polynomial in terms of the input size. Such algorithms are viewed usually as the most efficient. However, this significantly changes if we deal with massive data. It is sometimes necessary to find good algorithms which deal with implicitly given data and access only small part of the data (e.g., a random sample), or small compressed or encoded representation of the data. The information which we have about the data is now of size which would be substantially much smaller than the size of the whole original input. In the extreme situations it can be even constant (small sample sufficiently well approximating, with high probability, the exact output). The main theme of this research is to understand the relation between the complexity of the problems in the classical model (where the access to data is given explicitly) and the problems in the model in which the access to the input is implicit. The main focus is on study algorithms for one- and two-dimensional compressed texts, algorithms for property testing, and sampling-based approximation algorithms. In case of texts the implicit input is the compressed version of the input and in the case of property testing the reduced input is usually a small sample of the input. In the latter case, the we also deal with the problems on preprocessed input: an oracle provides fast answers to some basic structures underlying the input data, e.g., (in geometric setting) orthogonal range queries. In this case the implicit access to the input is constrained by range of allowable queries and by the input data representation. The goals for this research are to advance the state-of-the art in algorithms with restricted access to the input data in general, in particular algorithms for compressed texts and sampling-based property testing and approximation algorithms. The intellectual merit of this proposal is wide and it involves advances in computational complexity of new classes of problems: pattern-matching in compressed or fully compressed one and two-dimensional texts, property testing with access only to a small portion of the input data and abstract combinatorial algorithms dealing with implicit input data. It is a belief of the investigators that new aspects of the algorithmic efficiency explored in this research may initiate an important and novel area of computer science. This would be of great importance in time and space effective processing of massive data, as well as of great importance in pure theory. The end-result of this research effort are an understanding of the effects of special access to the input data on the efficiency of related algorithms.

View original record on NSF Award Search →