CSR: Medium: Collaborative Research: Soup: Flexible Storage and Processing for On-Line Applications
Massachusetts Institute Of Technology, Cambridge MA
Investigators
Abstract
The project aims to build a new kind of storage system for use in busy web sites, combining high performance with ease of programming. The project's key idea is to ask a web site's developers to declare in advance all the ways in which the web site will need to retrieve and process data. This allows the database to prepare all the required outputs in advance, and keep these outputs up to date as new data is inserted into the database. The result is that the web site can read data (and thus generate web pages) efficiently. The project prototype, called Soup, uses a data-flow graph to keep materialized views up to date as database writes arrive; these views hold the results for the web site software's pre-declared queries. However, as the web site software evolves, it will change the set of queries it needs. Soup uses several novel techniques to handle these changes efficiently: re-use of state across successive versions of the data-flow graph, and partial materialization of views and internal data-flow state. Soup supports transactions by combining optimistic concurrency control with data-flow, and allows scale-up of throughput by spreading data and computation over multiple servers. Web sites are an important part of modern life, and an enormous effort is invested in building and maintaining them. This effort could be significantly reduced if storage systems were better matched to the needs of web sites. Soup will provide this better match, by combining the ease of use of relational databases with much-increased speed and efficiency. The project's main results will be a prototype implementation, along with sample applications, documentation, and research papers. The code (Soup and sample applications) will be maintained on GitHub, where anyone can examine and fetch the most recent versions. Documentation will also be maintained on GitHub, and papers will be available on the project web site. We intend to maintain the project repository for at least five years beyond the end of the project. All of these resources will be available from the project web page: https://pdos.csail.mit.edu/soup
View original record on NSF Award Search →