GGrantIndex
← Search

Collaborative Research: CDS&E: Applied Algebraic Statistics through R

$0FY2016MPSNSF

University Of Kentucky Research Foundation, Lexington KY

Investigators

Abstract

The interface of applied algebraic geometry and statistics known as algebraic statistics abounds with fresh insight into old and new problems in practical data analysis. The fundamental connection stems from the realization that many statistical models are or can be identified with geometric structures amenable to algebraic investigation, enabling statisticians to draw from the great wealth of algebraic tools when solving statistical problems. Since this recognition, algebraic tools have found applications all over statistics, especially in contexts involving cross-classified data. Despite these advances, the use of algebraic methods in traditionally statistical areas of data analysis is still not mainstream, mostly because the methods involve kinds of mathematical computations previously unnecessary for data analyses and, consequently, not available in standard software. This work confronts this problem head-on by 1) fortifying connections between a free statistical computing environment popular among data analysts (R) and various software in the mathematics community through add-on packages created by the PIs and 2) implementing user-friendly interfaces to cutting-edge algebraic statistical methods enabled by the external software. The R package algstat and supporting packages will be further developed, strengthening connections to software used in algebraic statistics and providing functions and data structures for algebraic statistical methods that leverage those software. In year one of the project, the PIs and their teams will work on LattE and 4ti2, and Markov bases techniques for exact inference in loglinear, logistic, and Poisson regression models will be created and improved. In year two, the PIs and their teams will work on Bertini. Functions and data structures related to the numerical solution of systems of polynomial equations will be improved and expanded, and applications to phylogenetics will be considered. In year three, the PIs and their teams will work on Macaulay2, fortifying its connection to R and using it to enhance the mpoly package and adaptively inform the MCMC routines for exact inference in exponential family models enabled by the LattE and 4ti2 connections.

View original record on NSF Award Search →