FMitF: A Novel Framework for Learning Formal Abstractions and Causal Relations from Temporal Behaviors

$1,000,000FY2018CSENSF

University Of Southern California, Los Angeles CA

Investigators

Jyotirmoy V Deshmukhcontact Paul B Bogdan Yan Liu

Abstract

Formal methods consist of a collection of techniques that help developers rigorously reason about the behaviors of software and hardware systems with the help of mathematical logic. While formal logic has been used for articulating system specifications for the purpose of verification or software synthesis, this project introduces a logic-based framework to address machine-learning problems such as classification (which category should a new datum be put into?), clustering (how should a collection of data points be grouped together into categories?) and discovery of causal relations (when should an earlier data observation be deemed to cause the appearance of a later data observation?) for time-series data, in which repeated observations are made over time. The use of formal logic opens new avenues such as enhancing the interpretability of machine-learning models, the explainability of learning results, and articulation of formal guarantees on the behavior of learning algorithms. The societal impact of this work targets discovery of latent information in time-series data in diverse domains such as healthcare, autonomous systems, and security. The research impacts education by providing cross-disciplinary training of undergraduate and graduates students in areas of data science, machine learning, formal methods, and introducing students to methods from statistical physics on a number of real-world systems. This project explores the intersection between the logical inference based on real-time temporal logics and statistical inference prevalent in machine learning. The algorithms developed in this project allow users to express domain knowledge in the form of signal predicates or chance constraints, and output the results of classification, clustering or causal discovery as formulas in specific real-time temporal logics. This allows the results of the machine-learning algorithms to be human-interpretable, and also improves the explainability of learning algorithms by answering the question of why a particular time-series datum is classified or clustered in a specific fashion. These techniques are able to model uncertainty in time-series data by creating a new class of non-parametric learning methods that combine concepts from statistical physics, information theory, and statistical inference. The use of a logic-based framework allows providing formal guarantees on the learning process itself by applying ideas such as probably-approximately-correct learning (from computational learning theory) to the inference of real-time temporal logic formulas from data. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →