COMPUTATIONAL TOOLS FOR BIOINFORMATICS AND GENOME ANALYSIS
Computer Research And Technology
Investigators
Linked publications & trials
Abstract
A crucial component to the recent major advances in genomic research has been the uniting of advances in biology with those in computers, informatics and networking. As sequencing throughput has increased, the technological burden has shifted increasingly to analysis and informatics. This project was established to ensure that necessary computational tools and resources are available to the NIH community. Software tools have been developed to integrate automated sequence analysis procedures with cDNA sequence data stored in a SYBASE relational database system. These include tools for prescreening cDNA sequence against a local database, automated searching against the NCBI network blast server, providing the display of the results allowing user interaction to select information to be inserted into the database. The application of these tools to over 65000 EST sequences from 8 normal and malignant human B Lymphocyte cDNA libraries were used to identify more than 10000 potentially lymphoid specific clones. These clones have been incorporated unto the Lymphochip cDNA microArray. Results from the first study utilizing Lymhochip arrays, Distinct Types of Diffuse Large B-Cell Lymphoma Identified By Gene Expression Profiling, Nature 403: 503-511 (2000), led to the initiation of the Lymphoma/Leukemia molecular Profiling Project (LLMPP). This is an international consortium whose mission is using Lymphochip cDNA microarrays to define the gene expression profiles of all types of human lymphoid malignancies. One primary goal of this Project is to redefine the classification of human lymphoid malignancies in molecular terms. A second major goal is to define molecular correlates of clinical parameters that can be used in prognosis and in the selection of appropriate therapy for these patients. Ongoing work to develop and support the necessary BioInformatics tools required for the LLMPP project continues. An integrated system for the storage, management, analysis and viewing of cDNA mircoArray data is being developed to support the NCI Advanced Technology Center microArray facility. A first generation system has been implemented and placed into routine service. This system allows storing of expression data in a relational database system, integrates the data with knowledge from external biological data sources and provides a basic Web based toolset for analysis and viewing of results. Currently, the system supports more than two hundred registered users and holds over 3500 sets of microarray data representing over 15 million expression points. Work on a second-generation system with more advanced analysis tools continues. Arrangements have been made with several other ICDs and consortiums at NIH to provide the resources and support necessary for their use of the system. Computational genetic linkage analysis software packages are widely used at NIH for the precise mapping of potential disease genes. This software is extremely computer resource-intensive and complex to use and maintain. We have assisted NIH laboratories performing linkage analysis by providing needed software on shared, high-performance computing platforms, as well as simplifying the procedures to use the software. An innovative approach in applying MLINK to a large mapping project, allowed the project to be run on a cluster of workstations. This work was instrumental in the analysis leading to first published map of the feline genome and has been subsequently used to supplement and enhance this genomic map. Genetic research into inherited diseases has been advanced by the establishment in collaboration with the National Center for Biotechnology Information of a large comprehensive database of people of Amish descent, drawn from multiple sources. Analyses are underway to investigate mortality and other outcomes based on inbreeding coefficients.
View original record on NIH RePORTER →