EAGER: Adaptive Methods for Scalable Dissemination and Retrieval of Scientific Information

$299,501FY2011CSENSF

Rutgers University New Brunswick, New Brunswick NJ

Investigators

Paul B Kantorcontact David Blei Paul H Ginsparg Peter I Frazier Thorsten Joachims

Abstract

This project seeks dramatically improved access to, and dissemination of, scientific information. Working with cooperating scientific users, it exploits synergies among three important innovations. These are: (1) adaptive and domain specific automatic derivation of topical representations. These topics describe both the documents in the collection, and the interests of the users, during particular searches. The topics support mechanisms for collaborative recommendation, and for exploring the precise contours of each user,s need. (2) Recognition that a combination or set of several items, together, is worth much more (or perhaps much less) than the sum of the values of the items individually. The arXiv experimental system (arXiv_XS) uses topics, and user feedback, to model the complexity of the user's need and interests. (3) Based on these innovations, the system can probe user's interest, selecting items where the user's feedback greatly improves the system's model of that user and his or her search. This "exploration" is designed to improve the systems performance, with minimal degradation of the current search. All these innovations are studied together with complex experimental design and statistical analysis; users may also volunteer to be interviewed, by the researchers, to provide richer information about their experiences with the system. Researchers from Rutgers, Cornell and Princeton lead the project. This exploratory project focuses on the following tasks: (1) develop a richly instrumented voluntary alternative interface to the arXiv, with suitable IRB consent materials supporting active user feedback in the research process, as users search; (2) implement three specific innovative technologies (topics, sets, probes); (3) study their impact on system effectiveness, using experimental design and well-defined performance measures; (4) collect rich user assessments, by telephone and online interviews; (5) assess scalability with respect to the size of the collection, and the size of the "communities of interest" that define the topical user models; (6) seek relations at other domain-specific archives, for potential future studies. If successful, this research will refute a perception that improvement in access and dissemination of scientific literature requires massive techniques adapted from the commercial models for recommender systems and crowd-sourcing. This research will also add to on experimental design, user modeling, and the study of active learning and exploratory system designs. This research will accelerate the production and sharing of scientific information, initially at the arXiv, and subsequently, wherever these innovations are implemented. The research aims to enable researchers who never meet each other to form an "invisible college" by enriching the arXiv systems understanding of all of its users. The project entails some risks, as users may be unwilling to share information about their research interests. While malevolent persons might seek to spam the system, falsely marking information as useful, it is anticipated that scientific communities will generate far less spam than does the world at large. Results of the research will be made available to other researchers, and incorporated in courses at all three universities. The Web site (http://arxiv_xs.rutgers.edu) is used to disseminate information and results from this project.

View original record on NSF Award Search →