III: Small: Multi-Version Concurrency Control on Modern Hardware
University Of Maryland, College Park, College Park MD
Investigators
Abstract
Database systems generally fall into two categories with respect to how they update data: either they overwrite the old value with the new value, or they store (and make accessible) both the old version and new version of the data. Most early database systems fall in the first category (overwriting data) because storage was expensive and limited. However, as compliance, auditing, reporting, back-up, and disaster recovery use-cases increasingly rely on reading database snapshots and data that may have been later updated, the ability of a database system to maintain recent versions of record values in near-line access modes has become increasingly important. Furthermore, maintaining multiple versions of record values potentially enables the database to improve its throughput of handling transaction requests - reads and writes of the same record can potentially occur concurrently instead of delaying transactions to avoid inconsistent reads. Therefore, the trend in database system architecture is towards the second category: keeping around multiple versions of data values. Existing multi-versioned database systems have done an adequate job in handling the compliance, auditing, reporting, back-up, and disaster recovery use-cases that exist today. However, in many cases they provide no protection against snapshot isolation anomalies that violates the correctness of the data in the database and can result in bugs in application code written on top of the database system. The few existing systems that avoid these anomalies only do so at significant costs to throughput and database performance. Furthermore, most existing multi-versioned database systems are designed for disk-storage, and struggle to perform well under main-memory workloads. We are building a completely redesigned multi-versioned database system, designed for main-memory deployments. It is architected to achieve extremely high throughput, while avoiding the fundamental write skew anomalies that have existed in previous systems affecting both application developers and database users. We are integrating novel techniques for database recoverability and transaction chopping into the multi-versioned database system architecture in order to improve transaction throughput by at least an order of magnitude. We are also investigating techniques for continuous snapshot replication and serving consistent read queries from geo-distributed replicas with bounded staleness. Furthermore, we are performing extensive experimental studies that both motivate the design of our system, and also motivate the use of multi-versioned database systems in general vs. single-versioned systems.
View original record on NSF Award Search →