GGrantIndex
← Search

ITR: Language, Learning, and Modeling Biological Sequences

$3,625,503FY2002CSENSF

University Of Pennsylvania, Philadelphia PA

Investigators

Abstract

EIA-0205456 Joshi. Aravind K University of Pennsylvania ITR: Language, Learning, and Modeling Biological Sequences Recent significant advances in natural language processing such as the integration of grammatical and probabilistic machine-learning techniques have not been exploited for modeling biological sequences. These new techniques are highly relevant to the biological domain because they support the integration of sequence features at several scales, from dependencies between successive items through dependencies involving complex structures to overall sequence statistics. Hence, the major goals to be pursued are: (1) Development of new techniques for integrating grammatical and probabilistic information, in particular, integration and evaluation of grammatical, probabilistic, and approximate counting methods for fold prediction in secondary and tertiary structures of biomolecules. (2) Development and evaluation of probabilistic exponential models for gene finding, in particular genes for apicoplast-targeted proteins in eukaryotic human pathogens of the phylum `Apicomplexa'. This research is highly interdisciplinary, involving the disciplines of computer science, biology and linguistics. It will have a significant impact on the modeling of biological sequences. It will also provide a wonderful opportunity to train new researchers to carry out this interdisciplinary research, thus contributing to science and mathematical education and human resource development. The proposed research arose out of many discussions that took place at a landmark workshop on `Language Modeling of Biological Data' held at the University of Pennsylvania in February 2001.

View original record on NSF Award Search →