GGrantIndex
← Search

NSF Conference in the Mathematical Sciences on Data Mining and Bioinformatics; January 8-10, 2004; Gainesville, FL

$17,500FY2003MPSNSF

University Of Florida, Gainesville FL

Investigators

Abstract

DMS-0337163 PI: George Casella ABSTRACT NSF Conference in the Mathematical Sciences on Data Mining and Bioinformatics at the University of Florida, January 8-10, 2004 There is a large demand for statistical tools to help us analyze and understand massive amounts of data. Traditional statistical approaches often fail to cope with the underlying complexity of such datasets. Some of the potential statistical issues are model selection, including algorithms to search through model spaces, robustness, data quality and sampling, multiplicity issues, inference in high-dimensional, small sample ("large p small n") problems, appropriate scaling of data, and inference based on complex datasets from medical images, microarrays or environmental monitoring. Biological questions inherent in such data include determining the three dimensional structure of proteins based on DNA sequences and determining the differential expression levels of thousands of genes from data collected on microarrays. Data mining (DM) is the generic term that encompasses such methods for massive datasets. Data mining on biological and genomic data is often called Bioinformatics (Bio). The topic of Data Mining and Bioinformatics is ideal for a NSF regional conference. It appeals to a wide spectrum of researchers with diverse statistical interests from those interested in internet trafficking to fraud detection to microarrays to protein structure. The challenge of drawing inferences based on these massive datasets will appeal to those interested in theoretical and methodological statistics. Finally, the excitement of accepting the fine challenge of analyzing unorthodox data where existing statistical methodology is not satisfactory will undoubtedly fascinate researchers concerned with applications of statistics. It is hoped that this conference will provide an assessment of the current state of the art in the workings and use of DM/Bio, bring up open problems, and foster collaboration among research workers in academia, industry and government in an effort to provide solutions to these problems and answer questions of great importance to both science and society. We expect the conference to generate interest in this topic among researchers nationwide (particularly young researchers), among faculty and graduate students at the University of Florida and neighboring universities, and promote interactions between junior and senior researchers.

View original record on NSF Award Search →