UniProt: A Centralized Protein Sequence and Function Resource

$76,164U41FY2011HGNIH

European Molecular Biology Laboratory, Heidelberg

Investigators

Linked publications & trials

Abstract

DESCRIPTION (provided by applicant): The specific aim of this UniProt Consortium is to provide a centralized protein sequence and function resource by enhancing the UniProt Knowledgebase (UniProtKB) and ensuring that the diverse information in UniProt is of use to a broad scientific user community by exploiting a range of dissemination strategies. The UniProtKB will include a variety of data types including, but not limited to, protein sequences, nomenclature, family classifications, and alternatively-spliced and modified forms. Relevant information on protein function will be included with potential protein interactions, expression patterns, pathways and controlled vocabularies of Gene Ontology (GO terms). Annotation methods applied in the UniProtKB will include extraction of information from the literature and computational analyses, as well as integrating and mining large-scale data sets. The types of evidence and methods of annotation for both experimental and computational data along with attribution of the source will be included. The UniProtKB will rely on high interoperability with other databases, while exploiting novel approaches to encourage community curation. To facilitate the use of UniProt, the UniProt Consortium will enhance its existing user-friendly interfaces and tools to allow for simple and complex queries and for retrieval of large datasets. Database records will be down-loadable in defined, parsable format. An efficient and responsive user support service will be provided. Finally, the UniProt Consortium will exert the flexibility and adaptability needed to respond to changing needs of the scientific community. The broad, long-term objectives of this project are: To provide the scientific community with the Universal Protein Resource (UniProt) as a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. To enable scientists to identify and analyze products of protein-coding genes by making text- and sequence-based queries in the UniProt databases. To provide efficient and unencumbered access to the databases produced by the UniProt Consortium. RELEVANCE: The databases produced by the UniProt Consortium will provide researchers with an integrated access to protein sequence and function by gathering and enriching data from genomics and proteomics projects as well as the results published by individual researchers. This is a crucial step in making genomics and proteomics research results easily accessible to support biomedical research in academia and industry and hence facilitate the development of preventive and curative strategies for human health.

View original record on NIH RePORTER →