GGrantIndex
← Search

CAREER: Mining Salient Localized Patterns in Complex Data

$437,000FY2005CSENSF

University Of North Carolina At Chapel Hill, Chapel Hill NC

Investigators

Abstract

One of the greatest challenges in modern data analysis is to find significant and non-obvious patterns within immense and complex data sets. The detection of such salient patterns is an indispensable tool for comprehending the trends and meaning of data. Such tools are required by scientists, economists, marketing analysts, and all other data analysts. This project is developing new methods and tools for identifying the salient patterns within complex data sets and has the following objectives: design robust and scalable algorithms for mining the most salient patterns; evaluate the significance of mined patterns in the context of complex and noisy data; and integrate and correlate heterogeneous data sets based on corresponding patterns. The project is focussed on problems related to bioinformatics with four driving applications: Integrative Genetics of Cancer Susceptibility, HIV Salivary Gland Disease (SGD) Pathogenesis, Discovering Family Specific Residue Packing Patterns of Proteins, and Integrative Functional Annotation of Proteins. All of these applications produce massive quantities of data thereby providing an excellent testbed for the salient pattern mining algorithms being developed in this project. The intellectual merits include a new class of data analysis tools for analyzing the huge data sets generated by modern quantitative genetics technologies. These tools will assist biologists in their study of functional proteomics, aid in their understanding of disease progression, and assist in the search for effective treatments. In order to be useful, the data mining techniques must also be accurate, computationally efficient, and operate autonomously. If successful, this project will make significant contributions to bioinformatics and computational biology. Results from this research will be disseminated through publications and the software will be made publicly available through a web portal. The broader impacts of this research include interdisciplinary collaboration and training, immediate applications to fields other than life sciences, a multitude of educational impacts, and outreach to underrepresented groups in the sciences. The pattern mining methods will be applied to analyze the administrative paperwork of child welfare cases from the North Carolina Department of Health and Human Services (NC-DHHS) in an effort to improve services and achieve better outcomes for children in the welfare system. Long-term interdisciplinary collaborations with scientists have been established and will be strengthened during the course of this project. Educational impacts include new curriculum developments for computer science and bioinformatics, support of multidisciplinary educational experiences, and services to the research community.

View original record on NSF Award Search →