GGrantIndex
← Search

Abstract Parsing: Static analysis of dynamically generated string output

$299,327FY2009CSENSF

Kansas State University, Manhattan KS

Investigators

Abstract

This project provides an automated, formal-methods-based methodology and tool that analyzes and validates, in advance of its execution, PHP, Perl, or Javascript programs that dynamically generate HMTL, XML, and SQL documents. Such document-generator programs are common to the World Wide Web and are notorious for generating ill-structured, faulty, and dangerous documents that cause subsequent server errors or security breaches. The methodology integrates techniques from LR(k)-parsing, data-flow analysis, and program security to synthesize the program-analysis. Given the program (e.g., a PHP script) that generates documents and given the context-free reference grammar for the document language (e.g., a grammar for HTML), the analysis tool generates an LR(k) parser for the reference grammar and applies a data-flow analysis to analyze the program and predict the context-free grammatical structure of the documents to be generated by the program. The tool computes abstract parse stacks, a novel and innovative structure that encodes a generated document's context-free structure. Next, the tool applies formal semantics techniques to compute from an abstract parse stack its context-sensitive semantics, that is, the meaning of the dynamically generated document. Dynamically generated documents are often assembled with user-supplied input, which can be erroneous or malicious. The analysis annotates the abstract parse stacks to identity where user input might appear, and the semantic analysis tracks the influence of the user input upon the document's meaning.

View original record on NSF Award Search →