280 likes | 386 Views
Sebastiano Peluso , Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues. When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication. Talk Structure. Motivation and related work The GMU protocol Experimental results.
E N D
Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Talk Structure • Motivation and related work • The GMU protocol • Experimental results 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Motivation and related work 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Distributed STMs • STMs are being employed in new scenarios: • Database caches in three-tier web apps (FénixEDU) • HPC programming language (X10) • In-memory cloud data grids (Coherence, Infinispan) • New challenges: • Scalability • Fault-tolerance REPLICATION 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Full Replication • All sites store the whole set of data • Full replication in transactional systems is a very investigated problem: • Several solutions in DBMS world: • Update anywhere-anytime-anyway solutions [SIGMOD96] • Deferred-update replication techniques [JDPD03, VLDB00] • Lazy techniques by relaxing consistency properties [SOSP07] • Specific solutions for DSTMs: • Efficient coding of the read-set [PRDC09] • Communication/computation overlapping [NCA10] • Lease-based commits [Middleware10] 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Partial Replication • It is a way to increase scalability. • Each site stores a partial copy of the data. • Genuine partial replication schemes maximize scalability by ensuring that: • Only data sites that replicate data item read or written by a transaction T, exchange messages for executing/committing T. • Existing 1-Copy Serializable implementations enforce distributed validation of read-only transactions [SRDS10]: • considerable overheads in typical workloads 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Objectives • Objectives • Partially replicated DSTM • Scalability and performance as first class targets • Find a sweet spot in the consistency/performance tradeoff • Requirements • Read-only transactions never abort or block • Genuine certification mechanism 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
IssueswithPartialReplication • Extendingexistinglocalmultiversion (MV) STMsisnotenough • Local MV STMsrelyon a single global countertotrackversionadvancement • Problem: • Commitoftransactionsshould involve ALL NODES NO GENUINENESS = POOR SCALABILITY 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
GMU: Genuine MultiversionUpdateserializablereplication [ICDCS12] 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Key concepts G M U • In the execution/commit phase of a transaction T, ONLY nodes which store data items accessed by T are involved. • It uses multiple versions for each data item • It builds visible snapshots = freshest consistent snapshots taking into account: • causal dependencies vs. previously committed transactions at the time a transaction began, • previous reads executed by the same transaction • Vector clocks used to establish visible snapshots 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Main data structures (i) • For each node N: • VCLog: sequence of vector clocks of “recently” committed transactions on N • PrepareVC: vector clock greater than or equal to the most recent vector clock in VCLog 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Main data structures (ii) • For each transaction T: • VC: a vector clock that is • initialized with the most recent vector clock in local VCLog, • updated • upon reads during execution >> to ensure that T observes the most recent serializable snapshot, • at commit time >> to assign final vector clock to the transaction (and to its write-set). 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Main data structures (iii) • A chainofversions per data item id: Transaction Tcommitson nodei T’s Vector Clock 0 1 i n-2 n-1 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Treadsid on nodei: Rule1 • Informally:itavoidsreadingremotely “tooold”versions • Formally: ifitis the first readofT on i • waitthatVCLog.mostRecVCi[i] >= T.VC[i] • thisensuresthatcausaldependencies are enforced 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Rule1 in action Node 1 (it stores X) Node 2 (it stores Y) Node 0 Most recent VC in VCLog (1,1,1) (1,1,1) (1,1,1) T0:W(X,v) T0:W(Y,w) T0:Commit X(2) (1,2,2) T1:R(X) X(2) T1.VC (1,2,2) Commit Y(2) T1:R(Y) (1,2,2) Y(2) T1.VC (1,2,2) 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Treadsid on nodei: Rule2 • Informally:itmaximizesfreshnessbymovingT’s VC ahead in time“asmuchaspossible” in commit log • Formally: • ifitis the first readofT on i, select the mostrecent VC in i’sCommit Log s.t. VC[j] <= T.VC[j] for each node j on which T has already read Note: this updates only the entries of T.VC of the nodes from which T had not read yet T.VC=MAX{VC, T.VC} 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Rule2 in action Node 1 (it stores X) Node 2 (it stores Y) Node 0 Most recent VC in VCLog Y(1) (1,1,1) X(20) (1,1,1) (1,20,1) T0:R(X) X(20) T1:W(X,v) T0.VC (1,20,1) T1:W(Y,w) Y(11) T1:Commit (1,1,11) Commit Y(21) X(21) (1,21,21) (1,21,21) T0:R(Y) Y(11) T0.VC (1,20,11) T0:Commit 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Treadsid on nodei: Rule3 • Informally: observe the mostrecentconsistentversionofidbased on T’shistory (previousreads) • Formally: iterate over the versionsofidand return the mostrecentone s.t. id.version.VN <= T.VC[i] 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Committingread-onlytransactions • Read-only transactions commit locally: • No additional validations • No possibility of aborts • … and are never blocked, as in typical multiversion schemes. 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Committing update transactions • Run 2PC : • Upon prepare message reception (participant-side i): • Acquire read & write locks • Validate read-set • Increase PrepareVC[i] number and send PrepareVC back • If all replies are positive (coordinator-side): • Build a commit vector clock • Broadcast back commit message 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Building the commit Vector Clock • A variant of the Skeen’s algorithm is implemented [SKEEN85]. • This allows to keep track causal dependencies developed by: • a transaction T during its execution, • the most recent committed transactions at the nodes contacted by T 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Consistency criterion • GMU ensures Extended Update Serializability: • Update Serializability ensures: • 1-Copy-Serializabilty (1CS) on the history restricted to committed update transactions • 1CS on the history restricted to committed update transactions and any single read-only transaction: • but it can admit non-1CS histories containing at least 2 read-only transactions • Extended Update Serializability: • ensures US property also to executing transactions • analogous to opacity in STMs 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Experimental Results 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Experiments on private cluster • 8 core physical nodes • TPC-C • - 90% read-only xacts • - 10% update xacts • - 4 threads per node • - moderate contention (15% abort rate at 20 nodes) 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Experiments on private cluster • 8 core physical nodes • TPC-C • - 90% read-only xacts • - 10% update xacts • - 4 threads per node • - moderate contention (15% abort rate at 20 nodes) 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
FutureGridExperiments • All nodes are 2-core VMs deployed in the same site • TPC-C • - 90% read-only xacts • - 10% update xacts • - 1 thread per node • - low/moderate contention, also at 40 nodes 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Thanks for the attention 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
References [ICDCS12] SebastianoPeluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia, LuísRodrigues. “When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Replication”. The IEEE 32nd International Conference on Distributed Computing Systems, June, 2012. [JDPD03] Fernando Pedone, RachidGuerraoui, André Schiper. “The Database State Machine Approach”. Journal of Distributed and Parallel Databases, vol. 14, issue 1, 71-98, July, 2003. [Middleware10] NunoCarvalho, Paolo Romano, LuísRodrigues. “Asynchronous lease-based replication of software transactional memory”. Proc. of the 11th ACM/IFIP/USENIX International Conference on Middleware, 376-396, 2010. [NCA10] Roberto Palmieri, Francesco Quaglia, Paolo Romano. “AGGRO: Boosting STM Replication via Aggressively Optimistic Transaction Processing”. Proc. of the 9th IEEE International Symposium on Networking Computing and Applications, 20-27, 2010. [PRDC09] Maria Couceiro, Paolo Romano, NunoCarvalho, LuísRodrigues. “D2STM: Dependable Distributed Software Trasanctional Memory”. Proc. of 15th IEEE Pacific Rim International Symposium on Dependable Computing, 307-313, 2009. [SIGMOD96] Jim Gray, Pat Helland, Patrick O’Neil, Dennis Shasha. “The dangers of replication and solutions”. Proc. of the 1996 ACM SIGMOD international conference on Management of data, vol. 25, issue 2 , 173-182, June, 1996. [SKEEN85] D. Skeen. “Unpublished communication”, 1985. Referenced in K. Birman, T. Joseph “Reliable Communication in the Presence of Failures”, ACM Trans. on Computer Systems, 47-76, 1987 [SOSP07] G. DeCandia et al. “Dynamo: Amazon’s Highly Available key-value Store”. Proc. of the 21st ACM SIGOPS Symposium on Operating Systems Principles, 2007 [SRDS10] Nicolas Schiper, Pierre Sutra, Fernando Pedone. “P-Store: Genuine Partial Replication in Wide Area Networks”. Proc. of the 29th Symposium of Reliable Distributed Systems, 2010. [VLDB00] Bettina Kemme, GustavoAlonso. “Don’t Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication”. Proc. of the 26th International Conference on Very Large Data Bases, 134-143, 2000. 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal