PPoSS: LARGE: Intel: Combining Learning and Formal Verification for Scalable Machine Programming (ScaMP)

$2,100,000FY2022CSENSF

Massachusetts Institute Of Technology, Cambridge MA

Investigators

Saman P Amarasinghecontact Adam J Chlipala Armando Solar-Lezama Jonathan M Ragan-Kelley Michael Carbin

Abstract

Modem applications combine the need for extreme scalability with enormous complexity to provide rich functionality to millions of simultaneous users. In this context, programmer productivity is achieved by building on deep stacks of pre-existing components and systems software, allowing programmers to focus on core application logic. However, for the applications with the highest performance needs, the standard approach is not enough; instead, painstaking performance engineering effort is needed for the code to take advantage of all available accelerators and exploit all the opportunities for optimization. The cost of this effort can make applications difficult to adapt to changes in requirements or to the newly available hardware. The ScaMP project is developing a novel programming system that offers a new approach for building modern applications with strong performance and scalability requirements. ScaMP stands for Scalable Machine Programming, and the project’s novelty is the way in which it leverages advances in machine learning and programming-language technology to capture users’ intent at the high level, translate that intent into a working implementation, make the generated code perform efficiently on a variety of platforms, and support its maintenance and evolution. ScaMP provides an iterative development model that combines extremely high-level specification with fine control over low-level implementation decisions and a high degree of performance portability. The impact of the ScaMP project will be to lower the cost of developing high-performance applications. ScaMP decomposes into four main layers. First, incremental multimodal specification starts from natural language and informal diagrams and refines them into precise component specifications written in safe stackable smart domain-specific languages. These DSLs make up the second layer of the system and can generate architecture-independent distributed code through Coq-proved algebraic rewrite rules. The next layer is correct-by-construction code-generator generation, which produces compiler backends for multiple heterogeneous architectures, supporting generation of highly optimized assembly code, guaranteeing correctness using Coq-proved translation validation. Both of these layers use learning, to infer both models of hardware platforms and strategies for optimizing for those platforms effectively; as well as formal methods, to create proof that programs were optimized correctly. Finally, the last layer supports lifetime monitoring, learning, and adaptation to manage the more "data-science" side of developing and evolving a heterogeneous software system, using measurement to drive regeneration and scaling out of higher-performance code. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →