CIF21 DIBBs: EI: mProv: Provence-Based Data Analytics Cyberinfrastructure for High-frequency Mobile Sensor Data
University Of Memphis, Memphis TN
Investigators
Abstract
This project addresses a rapidly growing opportunity: the ability of the research community to use high-frequency mobile sensor data. Mobile sensors (embedded in phones, vehicles, wearables, and the environment) continuously capture data in great detail, and have the potential to address problems in a range of scientific and engineering domains. This effort focuses upon a specific case -- health data -- that builds upon several capabilities developed in National Institutes of Health (NIH) sponsored projects for assembling and analyzing health data collected through mobile sensors and apps. Improvements to the usefulness of extremely noisy, distributed data can serve many communities, and the components are extensible outside the human health domain. Mobile sensors present a distinct set of data challenges: the data quantity and quality fluctuate, and uncertainty can be high. Establishing provenance on such noisy data is a challenge, and there are limitations on access to data from human subjects. This project addresses several of the distinctive challenges associated with mobile sensor data. Variability is addressed by providing detailed annotation with metadata (such as provenance and quality), and by providing facilities for context-specific reasoning about the metadata. The system captures provenance metadata along with data in a stream, and propagates this information alongside derived data from one stage to the next. This creates cyberinfrastructure that makes it possible to 'replay' mobile device data with different configurations, to comparatively benchmark two algorithms or to diagnose erroneous output. The project builds upon the capabilities and success of the NIH-funded Center of Excellence in Mobile Sensor Data to Knowledge (MD2K), which provides an open-source cyberinfrastructure enabling the collection, curation, analysis, visualization, and interpretation of high-frequency mobile sensor data. Conducting research with mobile sensor data collected by others continues to be challenging; this project develops a companion open-source provenance cyberinfrastructure, facilitating the sharing of the mobile sensor data itself. Results include metadata standards, interfaces, and runtime support for annotating data streams with the source (sensor, location, sampling rate, continuous or episodic), semantics of output (number, probability, class), provenance (features, rules for decision), and validation (specificity, sensitivity, benchmark used). The infrastructure accommodates a wide variety of data types and enables data discovery, analytics, visualization, integration, and validation by third party researchers. The project improves the ability of the wider scientific and engineering community to use mobile sensing systems and metadata, and it also has immediate, tangible societal benefits in health and wellness. This award by the Advanced Cyberinfrastructure Division is jointly supported by the NSF Directorate for Computer & Information Science & Engineering (Division of Computer and Network Systems, and Division of Information and Intelligent Systems).
View original record on NSF Award Search →