Global Alignment of Protein Sequences with Position-Spec
National Library Of Medicine
Investigators
Linked publications & trials
Abstract
The CDD (Conserved Domain Database) at NCBI currently uses local alignment tools (rps-BLAST) to perform its update. The update consists of taking a new putative protein sequence and matching it to a PSSM (position-specific scoring matrix) in the CDD. Sometimes, chimeric sequences corrupt the database, because they match well locally to PSSMs in the CDD, without having a full length match. It stands to reason that a global alignment method would be able to detect the chimeras, because the non-matching chimeric length would cause a low global score, although it does not cause a low local score. Currently, much human effort is directed at curating and culling out chimeras. The lack of a global p-value was the main obstacle to using global alignment in the CDD update, but in fact I have known of a method for calculating it for some time. Sergey Sheetlin is programming the method, for testing by Maricel Kann in the NCBI structure group. Preliminary tests show that the global alignment is much more sensitive to details of a sequence, and appears much more able to place a sequence in the correct subfamily than local alignment.
View original record on NIH RePORTER →