180 likes | 279 Views
Consistency Guarantees and Snapshot isolation. Marcos Aguilera, Mahesh Balakrishnan, Rama Kotla, Vijayan Prabhakaran, Doug Terry MSR Silicon Valley. Goals. Develop a cloud storage system featuring multiple consistency levels requires one API to learn, one system to administer
E N D
Consistency Guarantees and Snapshot isolation Marcos Aguilera, Mahesh Balakrishnan, Rama Kotla, Vijayan Prabhakaran, Doug Terry MSR Silicon Valley
Goals Develop a cloud storage system featuring • multiple consistency levels • requires one API to learn, one system to administer • handles diversity of requirements within and across applications • read-write transactions • with snapshot isolation • on replicated and partitioned data • consistency-based SLAs
Geo-Replication remote datacenter datacenter remote secondaries secondaries primary Read Write
Client API Transaction • Get (key) • Put (key, object) • BeginTx (consistency) • EndTx () • BeginSession (consistency) • EndSession () Puts/ Gets Session
Transaction Properties • Conventional transaction model • BeginTx … EndTx • Atomic updates to multiple objects • Multi-object reads from snapshots • Even across partitions
Partitioned Data for Scalability • Data partitioned by key range • Each partition has its own primary and secondary servers
Write Operations • Writes performed at primary server(s) • May have different primaries for different objects • Propagate to secondary servers eventually • Any gossip or anti-entropy protocol will do • Have a commit timestamp, i.e. global order • And deterministic outcomes • No write conflicts => All replicas converge towards a mutually consistent state
Versioned Data Store • Store version history for each object • Can perform writes as soon as commit timestamp is known • need not perform writes in commit order • Can eventually prune old versions Object A V1 V2 V3 V4 Object B V1 V2 time
Per-Replica State • Datastore = set of <key, value, timestamp> • High-time = timestamp of latest received write transaction • Assumes transactions are received in order • May receive periodic null transactions • Low-time = timestamp of most recent discarded object version
Read Operations • Single-key Gets go to one server • Multi-partition transactions may read from multiple servers • Server(s) selected based on desired consistency • E.g. read from nearby server when possible • Alternative: Broadcast operation to all servers • Take first response that is consistent enough
Read-Only Transactions • Transaction assigned a read timestamp • Read from snapshot at that time • See all write transactions committed before this time, and only those writes • Consistency guarantee places constraints on read timestamp
Reads on Versioned Data Store • Allows reads at any timestamp • Without placing constraints on write propagation • Assuming no future transaction could be assigned a commit timestamp before the read timestamp Object A V1 V2 V3 V4 Object B V1 V2 time Read timestamp
Selecting Read Timestamp assuming in-order delivery of writes
read timestamp strong Acceptable Read Timestamps read-my-writes monotonic bounded causal eventual 0 time BeginTx
Selecting Read Timestamp low high node A low high node B low high node C time Read timestamp
Read-Write Transactions • Transaction assigned a read timestamp and a commit timestamp • Use optimistic concurrency control • Old read timestamps increase the chance of abort • Read from snapshot at read timestamp • With selected consistency guarantee • Batch writes until commit • No undo needed • Validate transaction at commit timestamp
Transaction Lifetime time Transaction Get(x) … Put(x, value) Session Select read timestamp and perform Get Buffer Put Get commit timestamp, validate, and perform Puts
Committing Write Transactions Snapshot isolation => • Check that no object being written has a version between the transaction’s read timestamp and commit timestamp Serializability => • Check that no object being read or written has a version between the transaction’s read timestamp and commit timestamp