CAREER: Statistical and Computational Tools for the Analysis of High Dimensional Genetic Data
University Of California-Los Angeles, Los Angeles CA
Investigators
Abstract
Proposal ID: DMS-0239427 PI: Chiara Sabatti Title: CAREER: Statistical and computational tools for the analysis of high dimensional genetic data Abstract This project will enable the creation of novel statistical models and computational tools for the analysis of data in high dimensional spaces, as the one generated in the field of genetics. In particular the investigator and her colleagues will (a) develop models for genomic sequences that aim at establishing the total number of binding sites, their location and their interaction with each other; (b) pursue de-noising of gene array data, modeling of the dependence between the expression of various genes, and the identification of the number of different chemical signals originating change in expression; (c) model the notion of ``haplotype blocks'' and define the procedures to identify them with the purpose of gene mapping, and develop appropriate procedures of correction for multiple comparison in the same context. The project illustrates relations between the topics of model selection, multiple comparison, high-dimensional function estimation and leads to deeper understanding of connections between Bayesian models, minimum description length principle, and false discovery rates. The proposed research will additionally develop a new set of computational tools that are based on Markov Chain Monte Carlo sampling and representation of the objective distribution on a variety of different scales. The outlined research helps to tackle some fundamental questions regarding the role and the expression of genes, thus leading to improvements of the general welfare, trough the discovery of genes related to diseases, the development of genetic therapies, and the engineering of the over-production of protein of interests on industrial scale. By making the algorithms for genome and gene expression analysis publicly available, and upgrading the computing infrastructure, the project broadens the participation to scientific investigation of under-served community and enhance the general infrastructure for research. The proposed organization of interdisciplinary workshops, research activities and courses assures a broad dissemination of the results to enhance scientific understanding. The organizations of seminars on teaching statistics in interdisciplinary settings for high-school and college instructors goes in the direction of integrating research and education, promoting teaching, training, and learning.
View original record on NSF Award Search →