Serializable Isolation for Snapshot Databases

Serializable Isolation forSnapshot Databases Michael Cahill1, Uwe Röhm and Alan Fekete School of IT, University of Sydney {mjc, roehm, fekete}@it.usyd.edu.au 1. also

Outline • Snapshot isolation ≠ serializable • Why you should care • Previous work: applications deal with it • Our approach: fix the database • Implementation and evaluation

Snapshot isolation ≠ serializable • Snapshot isolation: • Transactions read a consistent snapshot of data • DBMS maintains multiple versions of data items to avoid locking for reads • Transactions don’t see concurrent writes BUT: • Not equivalent to a serial execution • In a serial execution, one transaction would see the other

Why you should care T1 T2

Vendor advice • Oracle: “Database inconsistencies can result unless such application-level consistency checks are coded with this in mind, even when using serializable transactions.” • “PostgreSQL's Serializable mode does not guarantee serializable execution...”

Previous work • H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, P. O'Neil in SIGMOD1995: “A Critique of ANSI SQL Isolation Levels” • A. Bernstein, P. Lewis and S. Lu in ICDE2000:“Semantic Conditions for Correctness at Different Isolation Levels” • A. Fekete, D. Liarokapis, E. O'Neil, P. O’Neil, D. Shasha in TODS2005: “Making Snapshot Isolation Serializable” • Analyze the graph of transaction conflicts • Conditions on the graph for application to be serializable at SI • If a dangerous structure is found, modify the application • S. Jorwekar, A. Fekete, K. Ramamritham, S. Sudarshan in VLDB2007: “Automating the Detection of Snapshot Isolation Anomalies” • M. Alomari, M. Cahill, A. Fekete, U. Röhm in ICDE2008:“The Cost of Serializability on Platforms That Use Snapshot Isolation”

Static analysis of SI anomalies Build static dependency graph, check for dangerous structures: cycle pivot outgoing conflict incoming conflict

Limitations of previous work • Determining the conflict graph is non-trivial • Repeat for every change to the application • Ad hoc queries not supported • Difficult to automate: reasoning required to avoid false positives

Our approach • New algorithm for serializable isolation • Online, dynamic • Modifications to standard Snapshot Isolation • Core Idea: • Detect read-write conflicts at runtime • Abort transactions with consecutive rw-edges • Don’t do full cycle detection

Challenges • During runtime, rw-conflicts can interleave arbitrarily • Have to consider begin and commit timestamps: • which snapshot is a transaction reading? • can conflict with committed transactions • Want to use existing engines as much as possible • Low runtime overhead • But minimize unnecessary aborts

SI anomalies: a simple case pivot commits last

The algorithm in a nutshell • Add two flags to each transaction (in & out) • Set T0.out if rw-conflict T0  T1 • Set T0.in if rw-conflict TN  T0 • Abort T0 (the pivot) if both T0.in and T0.out are set • If T0 has already committed, abort the conflicting transaction • In the following, we illustrate the main cases;for full details, see the paper

Detection: write before read read old y T1.in = true T0.out = true

Detection: read before write How can we detect this? lock x, SIREAD write lock x TN.out = true T0.in = true

Main Disadvantage: False positives no cycle unnecessary abort

Prototype: Berkeley DB • Implemented in Oracle Berkeley DB • Open source: extensible • Already includes SI and 2-phase locking (S2PL) • Page-level locking: avoids phantoms • Modified 692 lines of code out of 200K • Most changes related to locking: increased locking code by 10%

Experimental setup • Question: what are the costs and benefits of Serializable SI? • Comparing • standard SI • serializable SI (SSI) • serializable isolation with two-phase locking (S2PL) • SmallBank benchmark [ICDE2008] • Familiar banking-style transactions (balance, deposit, transfer, etc.) • Includes a write skew by design • Update-heavy • Benchmark run on a commodity PC running Linux 2.6

Experimental scenarios • Scenario 1: short transactions • medium/high contention (1% probability of collisions) • CPU bound (no waits for I/O) • Scenario 2: long transactions • medium/high contention • I/O bound (flushing the log) • Scenario 3: low contention • low probability of collisions (0.1%) • I/O bound • Graphs show avg of 5 runs & 95% confidence intervals

Scenario 1 (short txns): Throughput But SI is NOT serializable!

Scenario 1: abort rates at MPL 20

Scenario 2 (long txns): Throughput

Scenario 2: abort rates at MPL 20

Scenario 3 (low cont.): Throughput

Conclusions • New algorithm for serializable isolation • Online, dynamic, and general solution • Modification to standard Snapshot Isolation • Keeps the features that make SI attractive:Readers don’t block writers, much better scalability than S2PL • Feasible to add to a Snapshot Isolation DBMS with minor changes

Ongoing work • Further reduce the runtime overhead • Less false positives • Applying the algorithm to other engines • Row-level versioning, dealing with phantoms

Serializable Isolation for Snapshot Databases

Serializable Isolation for Snapshot Databases

Presentation Transcript

Federated Transaction Management With Snapshot Isolation

Consistency Guarantees and Snapshot isolation

Serializable Isolation for Snapshot Databases by Cahill, R öhm , and Fekete

Snapshot

Serializable Snapshot Isolation for Replicated Databases in High-Update Scenarios

JAVA Serializable

Snapshot

Automating the Detection of Snapshot Isolation Anomalies

Snapshot

Snapshot

Serializable Isolation for Snapshot Databases

Making Snapshot Isolation Serializable

Snapshot

Serializable Objects

Passive isolation: Pre-isolation for FF quads

Automating the Detection of Snapshot Isolation Anomalies

Isolation in Relational Databases

Social Media snapshot for:

Database Replication Using Generalized Snapshot Isolation

Database Replication Using Generalized Snapshot Isolation

Serializable Objects

Snapshot