ITR: Enhancing Access to the Bibliome for Genomics
Oregon Health & Science University, Portland OR
Investigators
Abstract
Project Summary The overall goal of this project is to enhance access to the "bibliome" (i.e., published scientific communications) by improving information retrieval (IR) systems in the genomics domain. This will be done by: 1. Developing a publicly available resource of content and tools that allows system developers and other researchers to evaluate retrieval capabilities in the genomics domain 2. Organizing the development and use of this resource in the Text Retrieval Conference (TREC) Genomics Track 3. Carrying out a research program that analyzes the aggregated results of the TREC Genomics Track experiments to synergistically enhance our understanding of their findings We seek five years of Information Technology Research (ITR) funding to develop this resource and carry out research related to it. The project will operate under the philosophy that has guided genomics work to many successes: all relevant data and tools will be shared in an open and free public repository. Success in this project will be measured by its level of contribution to operational IR systems for real-world searchers in genomics as well as advances in IR that occur as a result of the resources that are developed. Information technology (IT) has revolutionized biological research. Biologists used to conduct their research in a relatively limited information space. Many IT advances, however, have changed the nature of their work. New data sources, such as genome sequences and DNA microarray experiments, have increased the amount of raw data, while the growth of research findings and subsequent knowledge have taxed the ability of biologists, like other scientists, to keep up with progress in their areas of interest. The tremendous growth in experimental data is accompanied by an overwhelming rate of publications describing new findings. Keeping up with the published literature is becoming a daunting task for researchers. Finding all and only the required information for any specific research task, within the immense body of literature is almost impossible, and advanced technology to assist in sorting through it is an immediate need. The intellectual merit of this activity is that it will provide synergistic benefit for both the biology and IR communities. By creating standardized yet realistic information challenges, it will enable IR researchers to develop better tools for biologists than they could have working alone. As with most TREC endeavors, IR research itself will improve synergistically due to interaction among researchers working on common tasks created in conjunction with real-world end users. This project will also have broader impacts. The project will advance discovery and understanding while promoting teaching and learning. Discovery and understanding will ensue from the more robust experiments that can be used to evaluate systems emanating from this work, while teaching and learning will improve since many of the participating research groups are likely to work in academic settings where graduate and other students can take part. The infrastructure for research and education will be enhanced by the shared and public availability of the text retrieval resource produced by this project, allowing IR systems developers and biologists to better understand each other's work. It will also contribute tools to the growing "cyberinfrastructure" deemed necessary for optimal usage of IT in the new century. The increased collaboration across the disciplines should also lead to new ideas and proposals for research. Coupling this project to the TREC Genomics Track will allow broad dissemination of the work and lead to improved understanding of the scientific and technological aspects of IR in the genomics domain. And ultimately, society will benefit by the improvement of systems that biological researchers and others can use to advance genomics research and public understanding of it that leads to better diagnosis and treatment of disease.
View original record on NSF Award Search →