CRII: III: Managing Preference Data

$190,888FY2015CSENSF

Drexel University, Philadelphia PA

Investigators

Abstract

Preferences are orders among a collection of items attributed to a population of judges. Preference data comes in a variety of forms, such as ranked lists and pairwise comparisons, and is ubiquitous in a plethora of applications across different domains. Over the past decade, there has been a sharp increase in the volume of preference data, in the diversity of applications that use it, and in the richness of preference data analysis methods. Examples of applications include rank aggregation in genomic data analysis, management of votes in elections, and recommendation systems in e-commerce. The goal of this project is to streamline the management and analysis of preference data. Towards this goal the PI and her team will develop a framework called DB4Pref, providing support to computational and data scientists who work with preference data. Models, algorithms, data and software products developed as part of this project will be made publicly available. The work will have an impact on the scientific community, in particular on the analysis of functional genomics data, which is central to many areas of bioinformatics, and on social applications, where it will enable efficient and effective analysis of user preferences. The PI will involve graduate and undergraduate students in her research, and will continue to work with women and under-represented minorities. Both the process and the outcome of this research will be integrated into data management and data science courses taught by the PI. The work will adopt the relational database model, and will enrich it with extensions that are specialized for handling preference data. Specifically, the PI and her team will introduce a special type of a relation that is designed for preference data, and will propose composable operators on preference relations that can be embedded in SQL statements, for convenient reuse across applications. Scalable implementations of preference relations will be developed, along with analytics such as preference clustering and rank aggregation. Treating preferences, and preference analytics, as first-class citizen in a relational database, and making these available through (augmented) SQL queries, will brings two important advantages. The first is usability: since SQL queries are specified declaratively, users need not worry about data formats and implementation details of data manipulation methods. The second advantage is efficiency: the system is free to choose an appropriate query execution plan, and an efficient implementation of a specific analysis method, exploiting a trade-off between processing time and answer quality. To evaluate results of this work, and to invite contributions to this area by other members of the community, the PI and her team will develop a set of performance benchmarks based on real and synthetic datasets. For further information see the project website at https://www.cs.drexel.edu/dbgroup/db4pref

View original record on NSF Award Search →