Low-Rank Functional Data Analysis for Time-Resolved Spectroscopy and the Search for Earth-Like Exoplanets
Cornell University, Ithaca NY
Investigators
Abstract
Spectroscopy is essential in many scientific fields, including astronomy, where comparisons of lab-based spectra to stellar spectra reveal the chemical compositions of stars. Time-resolved spectroscopy—measuring spectra changing over time—has emerged as a powerful tool for probing the dynamics of many systems. It is used to study the variability of stars, the molecular dynamics of complex compounds, and time-dependent chemical processes in biological systems. This project aims to develop a framework for analyzing time-resolved spectroscopic data in settings where incomplete measurements are made across related objects, or repeatedly for one system, with random variations seen across the replications. Careful pooling and analysis of data across the replications can identify signatures of the processes producing spectral variability. The main application will be modeling time-resolved spectroscopy of stars as candidate hosts of extrasolar planetary systems, particularly data from searches for Earth-like planets orbiting Sun-like stars (exo-Earths). Small extrasolar planets are not directly visible, but their presence can be discerned: the tug of a planet on its star produces a small wobbling motion, which can be detected by measuring extremely small, time-varying Doppler shifts of spectral lines. Currently, the main obstacle to observing small planets is the spectral activity of the host star—the comings and goings of dark sunspots, bright plages, and flares can mask or mimic a planet's time-dependent signal. The project will develop new algorithms to disentangle stellar activity signals from planet signals in time-resolved spectral data. The project will also support training of a diverse population of astronomy students and postdoctoral researchers in advanced statistics, many of whom will go on to pursue non-academic STEM careers involving data science. A time-resolved (dynamic) spectrum can be described with a bivariate function of wavelength and time representing the (relative) intensity of light measured by a spectrograph versus wavelength and time. The goal of this project is to develop a framework to model data measuring a single dynamic spectrum, or many related dynamic spectra, with incomplete sampling and noise, for example, from observations of many candidate exoplanet systems with similar host stars. The framework will integrate techniques from approximation theory, and from functional data analysis (FDA), the branch of statistics concerned with analyzing data comprising measurements of ensembles of functions. A core component of the framework will be use of separable expansions, writing the dynamic spectrum as a sum of products of paired univariate functions of wavelength ("speclets") and time ("modulators"). When the bivariate function of wavelength and time is given, approximation theory identifies optimal speclets and modulators, using the asymmetric Hilbert-Schmidt decomposition, a procedure resembling singular value decomposition (SVD). Real data do not provide precise, dense sampling of the bivariate function of wavelength and time. The project will build on FDA approaches, including functional principal components analysis (FPCA) and hierarchical Bayesian stochastic process models, to enable speclet and modulator basis discovery that accounts for noise and incomplete, irregular sampling. The framework will be applied to simulated spectra of populations of stars to build a model for stochastic stellar variability that will enable discovery of exo-Earths in Doppler radial velocity searches for exoplanets. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →