GGrantIndex
← Search

AXS - Enabling Analysis of Petascale Astronomical Datasets

$579,295FY2020MPSNSF

University Of Washington, Seattle WA

Investigators

Abstract

Astronomy is being rapidly transformed by the advent of large, automated, digital sky surveys into a field where petabyte tabular data sets are becoming commonplace. Unfortunately, this increase has not been followed by commensurate improvements in the tools and frameworks: we are now limited not by the richness of our datasets, but by an inability to mine them for knowledge. When the most challenging questions of the day demand repeated, complex processing of large information-rich tabular datasets, scalable and stable tools that are easy to use by scientists are crucial. This is a project to develop, package, and deploy the Astronomical eXtensions for Spark (AXS), a scalable open-source astronomical data analysis framework built on Apache Spark. AXS will make it possible for astronomers, including and perhaps especially those who are not data management experts, to devise and execute astronomical big data analyses using industry-standard tools. This will be a transformative increase in the community's ability to extract knowledge from datasets collected at great expense, thus unlocking their value across all areas of astronomy. There will be opportunities for knowledge transfers and partnerships between industry and academia. The techniques used and created will be taught within the astronomy curriculum, and those curriculum materials will be made public. This will improve the competitiveness of astronomy students in careers beyond astronomy, and it will help to develop a globally competitive STEM workforce. AXS will enable astronomers to scale their analysis from a personal laptop to thousands of nodes on either cloud or NSF-supported cyberinfrastructure (CI). This system has already been prototyped, and leverages Spark, a state-of-the-art industry-standard engine for big data processing, to make it possible to query and analyze almost arbitrarily large astronomical catalogs while supporting complex workflows with astronomy-specific operations. The tool will be accompanied by a hosted demonstration service, documentation, and support for deployment on NSF CI resources and public cloud platforms. For long-term sustainability, AXS will be developed in a tight loop with major stakeholders, built on open source tools and processes, and strongly integrated with AstroPy and the PyData stack, which are widely used in astronomy. AXS will also robustly scale to large computational clusters, making both NSF-supported and public CI more accessible to astronomers. Developments by this project will enable other industrial and academic applications, especially those dealing with large, tabular, spatio-temporal datasets indexed on a sphere, such as geospatial analysis. This award by the Division of Astronomical Sciences within the NSF Directorate of Mathematical and Physical Sciences is jointly supported by the NSF Office of Advanced Cyberinfrastructure. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →