1 / 28

When Scalability Meets Consistency: Genuine Multiversion

Sebastiano Peluso , Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues. When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication. Talk Structure. Motivation and related work The GMU protocol Experimental results.

tess
Download Presentation

When Scalability Meets Consistency: Genuine Multiversion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  2. Talk Structure • Motivation and related work • The GMU protocol • Experimental results 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  3. Motivation and related work 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  4. Distributed STMs • STMs are being employed in new scenarios: • Database caches in three-tier web apps (FénixEDU) • HPC programming language (X10) • In-memory cloud data grids (Coherence, Infinispan) • New challenges: • Scalability • Fault-tolerance REPLICATION 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  5. Full Replication • All sites store the whole set of data • Full replication in transactional systems is a very investigated problem: • Several solutions in DBMS world: • Update anywhere-anytime-anyway solutions [SIGMOD96] • Deferred-update replication techniques [JDPD03, VLDB00] • Lazy techniques by relaxing consistency properties [SOSP07] • Specific solutions for DSTMs: • Efficient coding of the read-set [PRDC09] • Communication/computation overlapping [NCA10] • Lease-based commits [Middleware10] 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  6. Partial Replication • It is a way to increase scalability. • Each site stores a partial copy of the data. • Genuine partial replication schemes maximize scalability by ensuring that: • Only data sites that replicate data item read or written by a transaction T, exchange messages for executing/committing T. • Existing 1-Copy Serializable implementations enforce distributed validation of read-only transactions [SRDS10]: • considerable overheads in typical workloads 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  7. Objectives • Objectives • Partially replicated DSTM • Scalability and performance as first class targets • Find a sweet spot in the consistency/performance tradeoff • Requirements • Read-only transactions never abort or block • Genuine certification mechanism 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  8. IssueswithPartialReplication • Extendingexistinglocalmultiversion (MV) STMsisnotenough • Local MV STMsrelyon a single global countertotrackversionadvancement • Problem: • Commitoftransactionsshould involve ALL NODES NO GENUINENESS = POOR SCALABILITY 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  9. GMU: Genuine MultiversionUpdateserializablereplication [ICDCS12] 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  10. Key concepts G M U • In the execution/commit phase of a transaction T, ONLY nodes which store data items accessed by T are involved. • It uses multiple versions for each data item • It builds visible snapshots = freshest consistent snapshots taking into account: • causal dependencies vs. previously committed transactions at the time a transaction began, • previous reads executed by the same transaction • Vector clocks used to establish visible snapshots 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  11. Main data structures (i) • For each node N: • VCLog: sequence of vector clocks of “recently” committed transactions on N • PrepareVC: vector clock greater than or equal to the most recent vector clock in VCLog 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  12. Main data structures (ii) • For each transaction T: • VC: a vector clock that is • initialized with the most recent vector clock in local VCLog, • updated • upon reads during execution >> to ensure that T observes the most recent serializable snapshot, • at commit time >> to assign final vector clock to the transaction (and to its write-set). 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  13. Main data structures (iii) • A chainofversions per data item id: Transaction Tcommitson nodei T’s Vector Clock 0 1 i n-2 n-1 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  14. Treadsid on nodei: Rule1 • Informally:itavoidsreadingremotely “tooold”versions • Formally: ifitis the first readofT on i • waitthatVCLog.mostRecVCi[i] >= T.VC[i] • thisensuresthatcausaldependencies are enforced 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  15. Rule1 in action Node 1 (it stores X) Node 2 (it stores Y) Node 0 Most recent VC in VCLog (1,1,1) (1,1,1) (1,1,1) T0:W(X,v) T0:W(Y,w) T0:Commit X(2) (1,2,2) T1:R(X) X(2) T1.VC (1,2,2) Commit Y(2) T1:R(Y) (1,2,2) Y(2) T1.VC (1,2,2) 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  16. Treadsid on nodei: Rule2 • Informally:itmaximizesfreshnessbymovingT’s VC ahead in time“asmuchaspossible” in commit log • Formally: • ifitis the first readofT on i, select the mostrecent VC in i’sCommit Log s.t. VC[j] <= T.VC[j] for each node j on which T has already read Note: this updates only the entries of T.VC of the nodes from which T had not read yet T.VC=MAX{VC, T.VC} 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  17. Rule2 in action Node 1 (it stores X) Node 2 (it stores Y) Node 0 Most recent VC in VCLog Y(1) (1,1,1) X(20) (1,1,1) (1,20,1) T0:R(X) X(20) T1:W(X,v) T0.VC (1,20,1) T1:W(Y,w) Y(11) T1:Commit (1,1,11) Commit Y(21) X(21) (1,21,21) (1,21,21) T0:R(Y) Y(11) T0.VC (1,20,11) T0:Commit 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  18. Treadsid on nodei: Rule3 • Informally: observe the mostrecentconsistentversionofidbased on T’shistory (previousreads) • Formally: iterate over the versionsofidand return the mostrecentone s.t. id.version.VN <= T.VC[i] 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  19. Committingread-onlytransactions • Read-only transactions commit locally: • No additional validations • No possibility of aborts • … and are never blocked, as in typical multiversion schemes. 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  20. Committing update transactions • Run 2PC : • Upon prepare message reception (participant-side i): • Acquire read & write locks • Validate read-set • Increase PrepareVC[i] number and send PrepareVC back • If all replies are positive (coordinator-side): • Build a commit vector clock • Broadcast back commit message 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  21. Building the commit Vector Clock • A variant of the Skeen’s algorithm is implemented [SKEEN85]. • This allows to keep track causal dependencies developed by: • a transaction T during its execution, • the most recent committed transactions at the nodes contacted by T 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  22. Consistency criterion • GMU ensures Extended Update Serializability: • Update Serializability ensures: • 1-Copy-Serializabilty (1CS) on the history restricted to committed update transactions • 1CS on the history restricted to committed update transactions and any single read-only transaction: • but it can admit non-1CS histories containing at least 2 read-only transactions • Extended Update Serializability: • ensures US property also to executing transactions • analogous to opacity in STMs 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  23. Experimental Results 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  24. Experiments on private cluster • 8 core physical nodes • TPC-C • - 90% read-only xacts • - 10% update xacts • - 4 threads per node • - moderate contention (15% abort rate at 20 nodes) 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  25. Experiments on private cluster • 8 core physical nodes • TPC-C • - 90% read-only xacts • - 10% update xacts • - 4 threads per node • - moderate contention (15% abort rate at 20 nodes) 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  26. FutureGridExperiments • All nodes are 2-core VMs deployed in the same site • TPC-C • - 90% read-only xacts • - 10% update xacts • - 1 thread per node • - low/moderate contention, also at 40 nodes 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  27. Thanks for the attention 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

  28. References [ICDCS12] SebastianoPeluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia, LuísRodrigues. “When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Replication”. The IEEE 32nd International Conference on Distributed Computing Systems, June, 2012. [JDPD03] Fernando Pedone, RachidGuerraoui, André Schiper. “The Database State Machine Approach”. Journal of Distributed and Parallel Databases, vol. 14, issue 1, 71-98, July, 2003. [Middleware10] NunoCarvalho, Paolo Romano, LuísRodrigues. “Asynchronous lease-based replication of software transactional memory”. Proc. of the 11th ACM/IFIP/USENIX International Conference on Middleware, 376-396, 2010. [NCA10] Roberto Palmieri, Francesco Quaglia, Paolo Romano. “AGGRO: Boosting STM Replication via Aggressively Optimistic Transaction Processing”. Proc. of the 9th IEEE International Symposium on Networking Computing and Applications, 20-27, 2010. [PRDC09] Maria Couceiro, Paolo Romano, NunoCarvalho, LuísRodrigues. “D2STM: Dependable Distributed Software Trasanctional Memory”. Proc. of 15th IEEE Pacific Rim International Symposium on Dependable Computing, 307-313, 2009. [SIGMOD96] Jim Gray, Pat Helland, Patrick O’Neil, Dennis Shasha. “The dangers of replication and solutions”. Proc. of the 1996 ACM SIGMOD international conference on Management of data, vol. 25, issue 2 , 173-182, June, 1996. [SKEEN85] D. Skeen. “Unpublished communication”, 1985. Referenced in K. Birman, T. Joseph “Reliable Communication in the Presence of Failures”, ACM Trans. on Computer Systems, 47-76, 1987 [SOSP07] G. DeCandia et al. “Dynamo: Amazon’s Highly Available key-value Store”. Proc. of the 21st ACM SIGOPS Symposium on Operating Systems Principles, 2007 [SRDS10] Nicolas Schiper, Pierre Sutra, Fernando Pedone. “P-Store: Genuine Partial Replication in Wide Area Networks”. Proc. of the 29th Symposium of Reliable Distributed Systems, 2010. [VLDB00] Bettina Kemme, GustavoAlonso. “Don’t Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication”. Proc. of the 26th International Conference on Very Large Data Bases, 134-143, 2000. 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

More Related