III: Small: Efficiency Optimization for Neural Document Ranking with Compact Representations
University Of California-Santa Barbara, Santa Barbara CA
Investigators
Abstract
Over the last few years, the resurgence of neural models has greatly advanced the field of information retrieval enabling retrieval engines to effectively match and rank search results in response to a user query. For example, this new technology has enable to determine the most relevant documents in response to a query even when some query keywords may not appear in these documents. The main drawback of using deep neural models for ranking is that the retrieval is extremely time consuming. As a result, such models cannot be deployed in many practical search applications. This project is focused on studying efficient solutions to perform neural ranking computation and the developed techniques will be evaluated using public datasets to assess the solution’s effectiveness. The project integrates the research with an educational plan including undergraduate and graduate students' involvement, instructional material development, and outreach activities. This project carries out a two-thrust research agenda for efficient neural ranking. The first thrust investigates a fast re-ranking scheme for a dual-encoding architecture by leveraging precomputed embeddings to compose a query representation with approximation, and combining deep contextual token interactions and traditional lexical matching features. The second thrust of this project investigates a compact representation of document embeddings and strike a balance of relevance and space efficiency which affects online inference latency. The project exploits the composite nature of ranking inference for answering a query to approximate query embeddings, and decouples ranking contribution of document embeddings in deriving a compact representation. This research will advance our fundamental understanding of relevance and efficiency tradeoffs in neural information retrieval, and significantly reduce the computing and space cost of online inference while retaining the essential benefits of deep learning for effective ranking on affordable computing platforms. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →