Collaborative Research: Capturing Salient Features in Point Process Models via Stochastic Process Discrepancies

$135,361FY2020MPSNSF

Cornell University, Ithaca NY

Investigators

Abstract

In the 21st century, mathematical modeling has emerged as a primary tool for scientific investigation. Scientists construct mathematical and computer models intended to capture the important features of a complex physical phenomenon, and learn more about the phenomenon by exploring what values of any uncertain model parameters lead to model predictions that mimic closely what is observed in nature. Still, at best the models typically cannot capture all of nature's complexity – or, as George Box famously put it, "All models are wrong, but some are useful." This is really a good thing for discovery – often scientists can get new insights and develop deeper understanding by studying precisely why their models fail to match reality. In this research the PIs will develop a new approach to quantifying this "discrepancy" between mathematical models and observations, intended specifically for problems in which the data are numerical counts of events or objects. Examples arise in nearly every scientific field – counts of volcanoes or earthquakes or disease cases; of galaxies or stars or exoplanets; of photons or gamma ray burst pulses or neutrinos. This project will specifically address two classes of problems in astronomy. One class concerns how astronomers can convert raw data measuring light from astrophysical objects (such as stars and galaxies) into estimates of properties of the sources (such as brightness and color) with accurately quantified uncertainties, even when the precise shapes of the objects are not known, and their images overlap. A second class concerns using astronomical survey catalogs to learn the dominant demographic properties of stars, galaxies, or minor planets (such as asteroids), such as the distribution of their luminosities or masses. The NSF-funded Vera Rubin Observatory will produce data for both types of problems. The research will include developing fast, open-source computational algorithms implementing the new approaches. The project is motivated by application areas in which salient feature discovery is threatened by model misspecification. In applications with real-valued magnitude data with additive Gaussian errors, statisticians have addressed misspecification by introducing additive discrepancy processes into models, often using Gaussian processes. The two problem areas addressed here require analysis of discrete count or point process data: photon counting data comprising images and time series from cosmic sources, or demographic data in astronomical survey catalogs. Both areas rely on Poisson point process models, with an intensity function describing, say, the photon arrival rate (per unit area) as a function of direction and time, or the density of galaxies as a function of spatial location and luminosity. Additive discrepancy models are not applicable to such discrete-data settings. The team will develop new semiparametric methods that supplement parametric salient feature models with nonparametric discrepancy processes that flexibly model the departure of salient models from the true data generating process. An approach serving as a starting point represents the true underlying intensity function as the product of the salient feature model and a stochastic multiplicative discrepancy process. The salient feature model will be a parametricintensity function (e.g., with location, amplitude, scale, and shape parameters), sometimes in a superposition of multiple components (e.g., stars in an image, pulses in a transient burst, or population components). To model discrepancy from the salient model, a natural choice with appealing theoretical properties is a multiplicative gamma discrepancy process; composition with the Poisson point process leads to an overall negative binomial point process for the observations. The team will implement this approach, and generalize it in several directions. For demographic models, the discrepancy models will be embedded in a hierarchical model that accounts for measurement error and selection effects. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →