GGrantIndex
← Search

BIGDATA: Small: DA: Classification Platform for Novel Scientific Insight on Time-Series Data

$733,536FY2013CSENSF

University Of California-Berkeley, Berkeley CA

Investigators

Abstract

BIGDATA: Small: DA: Classification Platform for Novel Scientific Insight on Time-Series Data Abstract The deepest insights into the nature of complex physical systems arise from the measurement of how observables of those systems change with time. Such dynamism - witnessed on scales ranging from atomic to Universal - reveals the underlying forces that govern the interaction of the constituents of those systems. The temporal sampling of data from sensors and from simulations, then, may be seen as a primary vector towards the deepest scientific insight. In this respect, mechanisms to quickly and robustly extract and mine knowledge from diverse time-series data can be fundamental tool of modern data-driven science. This project will build a webservice portal for scientific teams to train state-of-the-art machine-learning algorithms on existing data and receive autonomously generated classification statements on new data, whatever the scale. Massive data storage and the scaling/parallelism of computational algorithms (using commodity cloud services) will be abstracted from the end users. The envisioned framework will act both to simplify the algorithm selection and application processes as well as to educate the broad user base in modern machine-learning approaches. This project will lead to the implementation of novel and efficient feature extraction algorithms on irregularly sampled time-series data, and will make them available in the context of a robust and scalable platform integrated with classification and cross-validation, that will lead to informed use of the algorithms for reliable scientific insight. This learning and prediction platform will accelerate data-intensive decision-making, and will be a new data analytics tool for the autonomous discovery of knowledge across a diverse range of scientific disciplines. Geo-scientists may use it to find new robust earthquake trigger algorithms, enabling on-the-fly decision-making to improve emergency response times. Astronomers may rapidly detect anomalies, identifying a class of new variable stars buried within data from a time-domain imaging survey. Neuroscientists could incorporate improved real-time feedback and prediction into prosthetics control systems. As an intelligent agent, the platform could be used as an automated annotator for streaming biomedical data. This work will deliver a new open-source toolkit and web platform that can serve as a fundamental tool for time-domain science. By design, it will grow organically as user-contributed code is integrated into the platform. With burgeoning adoption among some data-driven science disciplines the webservice will emerge as an educational platform in the use of learning algorithms for time-series data and as a societal service that can be used by anyone (even outside of traditional scientific disciplines) to test hypotheses on large scales with minimal effort. The website will also act as a public repository for large, well-described datasets useful for validating new time-series classification and prediction algorithms. A series of short and semester-long courses will be developed (and broadly disseminated) to teach a new generation of scientists how to use the platform (and other widely available resources) as central 21st century research instruments.

View original record on NSF Award Search →