Error Correction in Multiple Sequence Alignments

$562,500R01FY2010LMNIH

University Of Houston, Houston TX

Investigators

Linked publications & trials

Paper 27677569 Paper 27638547 Paper 22417914 Paper 21859807 Paper 21398626 Paper 21347285 Paper 21303550 Paper 21285032 Paper 21270172 Paper 21214904 Paper 21191662 Paper 20820768 Paper 20571085 Paper 20497997 Paper 20207713 Paper 19952117 Paper 19761605

Abstract

DESCRIPTION (provided by applicant): Global multiple sequence alignment is the most basic step in the comparative study of molecular sequences. It is also the foundation of numerous subsequent biological analyses, such as phylogenetic reconstruction, gene annotation, and three-dimensional structure prediction. The question of multiple sequence alignment quality has received much attention from developers of alignment methods. Less forthcoming, however, are practical measures for addressing alignment-quality issues in real life settings. We have recently devised two simple methodologies to identify and quantify the uncertainties in multiple sequence alignments and their effects on subsequent analyses. With these methods, reliable (anchor) and unreliable (error) segments in alignments can be identified. We also found that most errors in alignment are simple errors, i.e., the misplacement of one or a few indels. Existing MSA reconstruction methods take the purist approach to alignment: define the most appropriate objective function, heuristically find an MSA that approximately maximizes it, and iteratively refine it using the same scoring scheme. We propose to improve upon exiting alignment methods by augmenting them with a utilitarian step: identify possible alignment reconstruction errors, and correct the simple errors, which we found in preliminary studies to be the most numerous. This grant application proposes four specific aims: (1) to design a method for increasing the reliability of the alignment in error segments, thereby improving the reliability of the entire alignment;(2) to evaluate the new method in comparison to and in conjunction with existing methods;(3) to implement the new method as a public domain software package, and (4) to revisit cases in which conclusions were based on erroneous alignments and to study the effects of improved alignments on downstream analyses. We estimate that our work will substantially increase the reliability of alignments and downstream procedures that use alignment as input.

View original record on NIH RePORTER →