DataONE (Data Observation Network for Earth)
University Of New Mexico, Albuquerque NM
Investigators
Abstract
DataONE will create new cyberinfrastructure (CI) that will resolve many of the key challenges that hinder the realization of more global, open, and reproducible science. We will do so through four interrelated CI activities that are supported by the DataONE team of developers and the CI Working Group. First, we will significantly expand the volume and diversity of data available to researchers through the DataONE Federation of repositories (i.e., Member Nodes) for large-scale scientific innovation and discovery. DataONE will create lightweight and easily deployed "Slender Node" software and develop DataONE compatibility for common repository software systems (e.g. DSpace and others) that are already deployed in hundreds of high-value repositories worldwide. Second, we will incorporate innovative and high-value features into the DataONE CI. These new features include: 1) measurement search to leverage semantic technologies and enable highly precise data discovery and recall of data needed by researchers; 2) tracking the data through creation, all transformations, and analyses (provenance) to enable more reproducible science by storing and indexing provenance trace information that can be used to both reproduce scientific data processing and analysis steps and to discover specific data sources by examining the documented workflows; and 3) data extraction, sub-setting and processing services to enable researchers at any location to more easily participate in ?big data? initiatives (e.g. working with data from large environmental observatories and participating in broad-scale synthesis and modeling endeavors). These three new sets of features will dramatically improve data discovery; further support reproducible and open science; and enable scientists from any institution, independent of networking capacity, to extract subsets of large data sets held in DataONE-affiliated repositories for processing and interpretation. Third, we will maintain and improve core CI software and services (e.g., Coordinating and Member Node software stacks and key components of the Investigator Toolkit) so that the user experience continues to improve, new services can be easily added over time, and the CI can be readily upgraded as operating system and other supporting software systems continue to evolve. Fourth, we will increase the number of Member Nodes (size of the Federation) while maintaining cybersecurity and trust. Both of these activities respond to the need for DataONE network continuity and reliability that are critical to maintaining community trust and enabling researchers to achieve their science objectives. Four working groups that are each comprised of a small number of experts from computer and information sciences, domain sciences, and cyber-enabled learning will guide and contribute to DataONE CI development and usability, sustainability, and education and outreach. The CI Working Group will coordinate core CI research and development, including the addition of new services such as provenance tracking and semantically enabled measurement search. The Usability and Assessment Working Group will help DataONE understand community needs and expectations, and constantly improve the CI via feedback from usability analysis. The Community Engagement and Outreach Working Group will ensure that community needs are met and that education activities and materials achieve optimal impact. The Sustainability and Governance Working Group will empower the community to drive the organization?s governance structure and sustainability strategies, ensuring that DataONE can sustain services and evolve to meet the needs of researchers, libraries, sponsors, and other stakeholders for decades to come. In addition to developing robust and powerful infrastructure, DataONE aims to change the scientific culture by promoting good data stewardship practices. Our specific goals are to: 1) build a community of stakeholders through active engagement with data repositories and the broad community of scientists; and 2) educate scientists about good data life cycle practices through effective education, outreach and training activities and experiences. Community engagement in the biweekly Member Node Forum and the annual meeting of the DataONE Users Group will support expansion of the data content and services provided to and needed by the research community. A new DataONE webinar series and education resources (e.g., best practices and software tools, learning modules) will enable researchers to better steward their data and take advantage of the myriad services and tools available through DataONE. The DataONE Summer Internship Program will actively involve students in CI development and related DataONE activities such as creating and providing web-based educational resources.
View original record on NSF Award Search →