Collaborative Research: CDI-Type II: Discovery of Succinct Dynamical Relationships in Large-Scale Biological Data Sets
New York University, New York NY
Investigators
Abstract
Collaborative Research: 0836656 (Peter Doerschuk, Cornell University) 0836649 (Bud Mishra, NYU) 0836720 (Sanjoy Mitter and Emery Brown, MIT) Title: Discovery of Succinct Dynamical Relationships in Large-Scale Biological Data Sets ABSTRACT: Many types of information in neuroscience and molecular biology can be described as a set of measurements taken repeatedly as some index changes its value. In some situations, such as transcriptomic data measuring gene activities, the index is time while in other situations, such as in genetics association study, the index is position in a genomic DNA sequence and, in any case, the complete collection of data is referred to as a time series. Inference is the process of taking such time series, probably corrupted by errors, and computing answers to the following sorts of questions: (1) What is the system that generated the time series? For instance, if the system is known to be a differential equation of a specific type, what are the parameter values in the differential equation? (2) Given a completely specified system and a time series, did that system generate that time series? For instance, if a biologist has hypothesized a system that describes gene expression for a particular set of genes and then measures expression data, is the data compatible with the system, or equivalently, the hypothesis? (3) Given two time series, were they generated by the same system? For instance, if the pattern of nerve firings in a neural system is recorded in two different experimental situations, is the pattern the same or is it different? The four Principal Investigators are focused on three different biological application domains at three different biological scales: (1) the phenotyping of animal and human ethanol-consumption behavior (whole organism scale), (2) the pattern of action potentials measured on ensembles of neurons (cell-population scale), and (3) the time course of gene expressions as governed by the regulatory circuits of the cell (cellular scale). The types of challenges that are encountered in these applications include the following characteristics: the information is distributed over long periods of time rather than concentrated in time; the systems include delays and feedback paths; and the systems are highly nonlinear, including switching behavior, rather than linear. The major methodologies that will be developed and combined to solve inference problems in these application areas are: (a) information theory and stochastic control, (b) multi-scale approaches to learning the geometry of the data, and (c) computer algebra and symbolic computation. For example, to deal with the presence of delay and feedback in neuroscience systems, especially in the context of the interaction between information and stochastic control, requires a fundamental rethinking of classical information theory as it is employed in technology-based communication systems. As the cost of computing decreases, computing becomes increasingly pervasive. A major purpose of pervasive computing is the real-time collection of high-dimensional time series of very diverse types of data including biological, medical, financial, communication systems status, power systems status, etc. The project will provide computational algorithms and software to analyze this data in more sophisticated ways and thereby extract more sophisticated information. Action taken upon this more sophisticated information, e.g., personalized medicine based on individualized genomic information or more accurate and flexible control of power systems thereby avoiding blackouts, will have important human and economic benefits to society. An important component of the project is educational, e.g., three graduate students working on the project will receive tuition and stipend and an unrestricted number of undergraduates will participate through a variety of ways, e.g., project courses. By attracting talented students to science and technology and providing challenging research experiences, the project will have important work force benefits to society.
View original record on NSF Award Search →