SHF: Small: S3: Statistical and Structural Analysis for Spreadsheets
University Of Massachusetts Amherst, Amherst MA
Investigators
Abstract
Spreadsheets are the most commonly used programming environment in the world; there are more than 750 million users of Microsoft Excel alone. Spreadsheets are widely used in government, scientific, and financial settings; over 95% of US firms use them for financial reporting and 85% use them for budgeting and forecasting. Unfortunately, errors are endemic to spreadsheets; a recent study found an error rate of over 95%. Spreadsheet errors have had catastrophic consequences, leading to losses of billions of dollars. This project uses automatic analysis techniques designed specifically for spreadsheets to (a) automatically detect and help prevent errors in spreadsheets, dramatically increasing the reliability of their calculations, (b) reduce the risks of serious mistakes, and (c) potentially save the economy millions if not billions of dollars. This project develops statistical and structural analyses for spreadsheets (S3). Spreadsheets have unique features that make them different from standard programming languages, and thus demand new program analyses that exploit their characteristics. S3 employs statistical analyses over the spatial and deep structure of spreadsheet formulas to identify spreadsheet cells that are highly anomalous and thus likely to be wrong. S3 reduces the problem of finding data and formula errors to that of finding anomalous structures via a novel vector representation that combines spatial and structural information (including patterns of dependencies). Applying statistical analyses across these vectors can then identify formulas that are highly unusual in any dimension, and thus likely to be wrong. S3 operates both at an individual spreadsheet level and also incorporates learned models of spreadsheet usage from large bodies of existing spreadsheets to condition the analysis and further reduce false positive rates.
View original record on NSF Award Search →