150 likes | 246 Views
Sebastiano Peluso , Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues. A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication. Distributed STMs. STMs are being employed in new scenarios: Database caches in three-tier web apps ( FénixEDU )
E N D
Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues A MultiversionUpdate-SerializableProtocolfor Genuine Partial Data Replication Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
Distributed STMs • STMs are being employed in new scenarios: • Database caches in three-tier web apps (FénixEDU) • HPC programming language (X10) • In-memory cloud data grids (Coherence, Infinispan) • New challenges: • Scalability • Fault-tolerance REPLICATION Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
Partial Replication • Each site stores a partial copy of the data. • Genuine partial replication schemes maximize scalability by ensuring that: • Only data sites that replicate data item read or written by a transaction T, exchange messages for executing/committing T. • Existing 1-Copy Serializable implementations enforce distributed validation of read-only transactions [SRDS10]: • considerable overheads in typical workloads Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
IssueswithPartialReplication • Extendingexistinglocalmultiversion (MV) STMsisnotenough. • Local MV STMsrelyon a single global countertotrackversionadvancement. • Problem: • Commitoftransactionsshould involve ALL NODES NO GENUINENESS = POOR SCALABILITY Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
GMU: Genuine Multiversion Update-Serializable Replication [ICDCS12] G M U • In the execution/commit phase of a transaction T, ONLY nodes which store data items accessed by T are involved. • It uses multiple versions for each data item • It builds visible snapshots = freshest consistent snapshots taking into account: • causal dependencies vs. previously committed transactions at the time a transaction began, • previous reads executed by the same transaction • Vector clocks used to establish visible snapshots Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
High Level Overview (i) • Transactions commit using a vector clock. • Each node stores a log of committed vector clocks. • Initial view of the visible snapshot • Upon a transaction T begins on N: it acquires the most recent vector clock in N’s commit log. • View extension of the visible snapshot • Upon T reads on a node N: • T’s vector clock can be modified according to N’s commit log. • Three reading rules are applied using T’s vector clock. Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
High Level Overview (ii) • Write operation • Upon a transaction T writes V on data item O: it inserts <O,V> in T’s write-set. • Commit operation • Read-only transactions always commit. • Update transactions run a genuine 2-Phase Commit: • Upon prepare message reception (participant-side) • acquire read/write locks and validate read-set, • send back a tentative commit vector clock. • If all replies are positive (coordinator-side) • multicast write-set and final commit vector clock. Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
Rule1: ReadingLowerBound Node 1 (it stores X) Node 2 (it stores Y) Node 0 Most recent VC in VCLog (1,1,1) (1,1,1) (1,1,1) T0:W(X,v) T0:W(Y,w) T0:Commit X(2) (1,2,2) T1:R(X) X(2) T1.VC (1,2,2) Commit Y(2) T1:R(Y) (1,2,2) Y(2) T1.VC (1,2,2) Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
Rule2: Reading Upper Bound Node 1 (it stores X) Node 2 (it stores Y) Node 0 Most recent VC in VCLog Y(1) (1,1,1) X(1) (1,1,1) (1,1,1) T1:R(X) X(1) T0:W(X,v) T1.VC (1,1,1) T0:W(Y,w) Y(2) T0:Commit (1,1,2) Commit Y(3) X(3) (1,3,3) (1,3,3) T1:R(Y) Y(2) T1.VC (1,1,2) T1:Commit Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
Rule 3: Selection of Data Versions • Informally: observe the mostrecentconsistentversionof data item id on nodeibasedon T’shistory (previousreads). • Formally: iterate over the versionsofidand return the mostrecentone s.t. id.version.VN <= T.VC[i] Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
Building the commit Vector Clock • Based on a variant of the Skeen’s total order multicast algorithm [SKEEN85]. • Intuition: • Serialize all-and-only conflicting transactions, tracking • direct and transitive conflict dependencies, • causal relationship Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
ConsistencyCriterion • GMU ensures Extended Update Serializability: • Update Serializability [ICDT86] ensures: • 1-Copy-Serializabilty (1CS) on the history restricted to committed update transactions; • 1CS on the history restricted to committed update transactions and any single read-only transaction. • But it can admit non-1CS histories containing at least 2 read-only transactions. • Extended Update Serializability [Adya99]: • ensures US property also to executing transactions; • analogous to opacity in STMs. Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
Experiments on private cluster • 8 core physical nodes • TPC-C • - 90% read-only xacts • - 10% update xacts • - 4 threads per node • - moderate contention (15% abort rate at 20 nodes) Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
Thanks for the attention Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland
References [Adya99] A. Adya, “Weak consistency: A generalized theory and optimistic implementations for distributed transactions,” tech. rep., PhD Thesis, Massachusetts Institute of Technology, 1999. [ICDCS12] SebastianoPeluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia, LuísRodrigues. “When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Replication”. The IEEE 32nd International Conference on Distributed Computing Systems, June, 2012. [ICDT86] R. C. Hansdah and L. M. Patnaik, “Update serializability in locking,”. International Conference of Database Theory, vol. 243 of Lecture Notes in Computer Science, pp. 171–185, Springer Berlin / Heidelberg, 1986. [SKEEN85] D. Skeen. “Unpublished communication”, 1985. Referenced in K. Birman, T. Joseph “Reliable Communication in the Presence of Failures”, ACM Trans. on Computer Systems, 47-76, 1987 [SRDS10] Nicolas Schiper, Pierre Sutra, Fernando Pedone. “P-Store: Genuine Partial Replication in Wide Area Networks”. Proc. of the 29th Symposium of Reliable Distributed Systems, 2010. Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland