Plant Genomics Data Computational Interface

$792,047FY2010BIONSF

Carnegie Institution Of Washington, Washington DC

Investigators

Abstract

The Carnegie Institution for Science is awarded a grant to build an application programming interface (API) that will provide high-performance, web-based computational access for plant genomics data. The project will include a new, open technology for generating computational data web services; an open, portable, optimized data warehouse that supports very fast queries of plant biology data; and a plant biology query language, query builder, and query optimizer that will provide a simple way to limit query results to only the required data. The project will provide a range of data access methods to serve the needs of computational biologists and bench biologists for large and custom datasets. Implementation of this software at TAIR will provide REST and SOAP web services for computational use of TAIR data, RSS feeds for TAIR objects, and an implementation of a new query builder for TAIR. While strong advances have been made in data generation methods including new genome sequencing methods, high throughput phenotyping, protein localization and others, computational access to the resulting data still requires large amounts of both human and machine resources. By addressing this issue through architecture, this project leverages advances in software engineering by combining and applying them to the specific domain of plant genomics. In particular, developing a minimal but effective plant genomics schema using modern data modeling; leveraging model-driven architecture to enable generation of high-performance web services from platform-independent models; and developing a basic, well-formed query language for the plant genomics domain are intellectually challenging tasks that will have a significant impact on the technology required for computational access. Wide adoption of a standard set of web interfaces for computational access to plant genomics resources will greatly simplify the effort required to access and integrate plant genomic data, thereby facilitating computational analyses of the data. By providing an easy, robust, and consistent route to computational data access, standard web interfaces will also facilitate development of new resources that could transform existing datasets and present them in new ways, analogous to mashups using Google Maps data along with real estate listings, weather data, Wikipedia entries, etc. By providing computational APIs and the technological infrastructure to create them as open source tools, this project makes available a key set of technologies to computational biologists beyond TAIR. As the technology proves itself, it can move beyond plant biology into the more general biological realm. Further information about this project may be found at the TAIR website: http://arabidopsis.org.

View original record on NSF Award Search →