EAGER: Similarity Measures Based on Refinement Operators and Metric Embedding Applied to the Analysis of Immune Repertoires

$139,849FY2015CSENSF

Drexel University, Philadelphia PA

Investigators

Santiago Ontanoncontact Ali Shokoufandeh Uri Hershberg

Abstract

The notion of similarity plays a key role in modern machine learning and artificial intelligence (AI) in general, since it serves as an organizing principle by which algorithms classify objects, form concepts, and make generalizations. While similarity assessment has been widely studied, the important special case of assessing similarity in domains where the data of interest is structured has not received sufficient attention. These structured representations, however, play a key role in many domains, such as biomedicine, where data of interest naturally lends itself to structured representations. The research performed in this project aims at filling the gap in structural similarity knowledge by creating a novel generalized framework for similarity assessment. To achieve the creation of this framework the PIs will focus on the specific biomedical application of immune cell populations and their dynamics during development and in response to disease. By focusing on this specific domain, the performed research will evaluate the new approach in a real-world setting, while leading to significant contributions to the understanding of immune dynamics. The key concepts that will be developed in this research project are refinement operators and metric embedding. The key insight of the proposed work is that refinement operators can be used to define similarity measures, and to abstract away from the underlying representation formalism. This will lead to a new framework for similarity assessment that is applicable to a broad range of representation formalisms. Moreover, we propose to use metric embedding techniques to provide computationally efficient numerical approximations to the resulting similarity measures. The definition of general and tractable similarity measures, applicable to a range of structured representations, will be a significant contribution to structured machine learning and AI. The research team will use data collected from high throughput sequencing experiments, and evaluate the generality and performance of the proposed similarity measures by using them to analyze how repertoires of immune cell populations can be described and compared by their clonotypes (sets of cells with the same progenitor cell). The results from applying similarity measures to this problem will help us start to construct a comprehensive view of the impact of clonotype and whole repertoire information on our understanding of the dynamics of immune responses in general.

View original record on NSF Award Search →