210 likes | 363 Views
Telex/IceCube: a distributed memory with tunable consistency. Marc Shapiro, Pierre Sutra, Pierpaolo Cincilla INRIA & LIP6, Regal group Nuno Preguiça Universidade Nova de Lisboa. Telex/IceCube. It is: A distributed shared memory Object-based (high-level operations)
E N D
Telex/IceCube: a distributed memory with tunable consistency Marc Shapiro, Pierre Sutra, Pierpaolo Cincilla INRIA & LIP6, Regal group Nuno Preguiça Universidade Nova de Lisboa
Telex/IceCube • It is: • A distributed shared memory • Object-based (high-level operations) • Transactional (ACID… for some definition of ID) • Persistent • Top level design goal: availability • ∞-optimistic ⇒ (reconcile ⇒ cascading aborts) • Minimize aborts ⇒ non-sequential schedules • (Only) consistent enough to enforce invariants • Partial replication Distributed TMs — 22-Feb-2012
Scheduling(∞-optimistic execution) • Sequential execution • In (semantic) dependence order • Dynamic checks for preconditions: if OK, application approves schedule • Conflict ⇒ fork • Latest checkpoint + replay • Independent actions not rolled back • Semantics: • Conflict = non-commuting, antagonistic • Independent Distributed TMs — 22-Feb-2012
Telex lifecycle Application operations appt • User requests • > application: actions, dependence • > Telex: add to ACG, transmit Shared Calendar Application +action(appt) Telex Distributed TMs — 22-Feb-2012
Telex lifecycle Schedule • ACG • > Telex: valid path = sound schedule • > application: tentative execute • > application: if OK, approve Shared Calendar Application sched approve Telex Distributed TMs — 22-Feb-2012
Telex lifecycle Conflict ⇒ multiple schedules !!! • ACG • > Telex: valid path = sound schedule • > application: tentative execute • > application: if OK, approve Shared Calendar Application sched1 sched2 approve2 Telex Distributed TMs — 22-Feb-2012
Constraints(Semantics-based conflict detection) • Action: reified operation • Constraint: scheduling invariant • Binary relations: • NotAfter • Enables (implication) • NonCommuting • Combinations: • Antagonistic • Atomic • Causal dependence • Action-constraint graph ACG Distributed TMs — 22-Feb-2012
Single site scheduling • Sound schedule: • Path in the ACG that satisfies constraints • Fork: • Antagonism • NonCommuting + Dynamic checks • Several possible schedules • Penalise lost work • Optimal schedule: NP-hard • IceCube heuristics Distributed TMs — 22-Feb-2012
Minimizing aborts(IceCube heuristics) • Iteratively pick a sound schedule • Application executes, checks, approves • Check invariants • Reject if violation • Request alternative schedule • Restart from previous • Approved schedule • Preferred for future • Proposed to agreement Distributed TMs — 22-Feb-2012
Telex lifecycle Receive remote operations • User requests • > application: actions, dependence • > Telex: add to ACG, transmit, merge Shared Calendar Application getCstrt +cstrt(antg) Telex ACG Distributed TMs — 22-Feb-2012
Eventual consistency 0 • Common stable prefix • Diverge beyond • Combine approved schedules • ⇒ Consensus on next extension of prefix • Equivalence, not equality 0 0 Distributed TMs — 22-Feb-2012
Telex lifecycle Convergence protocol • approved schedules • > Telex: exchange, agree • > commit/abort, serialise Shared Calendar Application Telex agreement Distributed TMs — 22-Feb-2012
FGGC • Special-case commutative commands ⇒ collision recovery • Improvements higher on WAN. Fast Paxos Generalised Paxos Paxos FGGC WAN typical ☺ Distributed TMs — 22-Feb-2012
360 FGGC: Varying commutativity 1 Paxos • Each command reads or writes a randomly-chosen register; WAN FGGC 1 register 1024 2048 4096 8192 16384 FGGC 16384 registers ☺ Distributed TMs — 22-Feb-2012
Example: Sakura Shared Calendarover Telex Marc Lamia Tues. Fri. • Private calendar + common meetings • Example proposals: • M1: Marc & Lamia & JeanMi, Monday | Tuesday | Friday • M2: Lamia & Marc & Pierre, Tuesday | Wed. | Friday • M3: JeanMi & Pierre & Marc, Monday | Tues. | Thurs. • Change M3 to Friday • Telex: • (Local) Explore solutions, approve • (Global) Combine, Commit M1 M2 Run-time check Distributed TMs — 22-Feb-2012
Example ACG: calendar application • Schedule: sound cut in the graph Distributed TMs — 22-Feb-2012
Status • Open source • 35,000 Java (well-commented) lines of code • Available at gforge.inria.fr, BSD license • Applications: • Sakura co-operative calendar (INRIA) • Decentralised Collaborative Environment (UPC) • Database, STMBench7 (UOC) • Co-operative text editor (UNL, INRIA, UOC) • Co-operative ontology editor (U. Aegean) • Co-operative UML editor (LIP6) Distributed TMs — 22-Feb-2012
Lessons learned • Separation of concerns: • Application: business logic • Semantics: constraints, run-time checks • System: Persistence, replication, consistency • Optimistic: availability, WAN operation • Adapts to application requirements • Constraints + run-time machine learning • Modular ⇒ efficiency is an issue • High latency, availability ⇒ reduce messages • Commutative operations ⇒ CRDTs • Partial replication is hard! Distributed TMs — 22-Feb-2012
----- Distributed TMs — 22-Feb-2012
Application design • MVC-like execution loop + rollback • Designing for replication is an effort: • Commute (no constraints) • >> Antagonistic • >> Non-Commuting • Weaken invariants • Make invariants explicit Distributed TMs — 22-Feb-2012