GGrantIndex
← Search

CAREER: A Novel Framework for Knowledge Discovery from Time Series Data in Biology and Climate Science

$510,385FY2013CSENSF

University Of Southern California, Los Angeles CA

Investigators

Abstract

Recent advances in sensors and high throughput data acquisition technologies have made it possible to collect massive amount of data, and especially time series data in a number of domains (e.g., climate sciences, biological sciences). While a wide range of techniques have been developed for clustering and mining such data, there has been limited progress on scalable algorithms for extracting causal relationships from time series data. This project aims to develop novel machine learning models based on Granger causality to uncover the complex dependence structures from high-dimensional time series. The resulting algorithms will be evaluated in the context of two real-world applications (climate change, computational biology). The project aims to address three fundamental challenges of data analysis from time series data, including: (1) developing the theoretical foundations of causality analysis from time series data to quantify the gap between Granger causality and true causality, (2) developing a unified framework to incorporate different types of domain knowledge in data analysis, and (3) examining effective solutions to important but usually overlooked practical issues, including irregular nature of the time series and scalability. The resulting algorithms will be evaluated on two real applications, i.e., gene regulatory network discovery in immune systems and climate change attribution, by collaborating with researchers in biology and climate science. The proposed research could impact multiple application domains where discovery of causal relationships from high dimensional time series data is of interest. The project is expected to advance the theoretical foundations of data analytic techniques for time-series data and provide a unified framework that can easily integrate domain knowledge. The results of this project can be expected to significantly advance the current state of the art in eliciting insights regarding causal relationships from time series data. In addition to the core research advances, this project contributes easy-to-use software based on workflows for teaching machine learning to students, researchers and practitioners with a broad range of backgrounds. Educational and outreach activities include new interdisciplinary courses, workshops, tutorials, and high-school visits. Software and data resulting from this work will be freely disseminated to the broader research and educational community. Additional information about the project can be found at: http://www-bcf.usc.edu/~liu32/uscTimeSeries.htm.

View original record on NSF Award Search →