150 likes | 340 Views
Dangers of Replication. Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a Solution. SIGMOD, 2006.” http://research.microsoft.com/~gray/replicas.ps. What’s the danger?. Replication of transactional data results in unstable system performance
E N D
Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a Solution. SIGMOD, 2006.” http://research.microsoft.com/~gray/replicas.ps
What’s the danger? • Replication of transactional data results in unstable system performance • For consistent replication • Waits and deadlocks • For update-anywhere-anytime replication • Reconciliations • Both grow polynomially (w/ meaningful exponents) in the number of clients • Based on simple, lower bounds derived from mean-value analysis
What’s the point? • This theme is predicated on the knowledge that globally consistent replication does not scale
Replication Policies • Eager replication: • Copies are updated as part of the original transaction. • Lazy replication: • One replica is updated. Other copies are updated asynchronously • Update policy: • Group: any node can update its replica. • Master: only master updates its replica. The rest replicas are read only.
The Scale-up Pitfall • Replication works well on small, prototype systems • But, at deployment, replication is unstable • At larger scales • Messages propagation delay increases • Higher transaction rates • For eager replication • More transactions with each txn taking longer • For lazy transactions • Delays in reconciliation leads to system delusion
Analysis of Eager Group Replication • Scaling laws • Third power of the number of nodes • Fifth power of the # of operations per transaction • Problems with eager replication • Cannot be used by disconnected nodes • Probability of deadlocks (failed transactions) increases with systems size
Analysis of Lazy Group Replication • Scaling laws • Third power of the number of nodes • third power of the # of operations per transaction • Better than eager, but not so good
Analysis of Lazy Master Replication • Scaling laws • second power of the number of nodes • fifth power of the # of operations per transaction
Status of Replication • Negative scaling results • Don’t account for message delays (so it’s worse) • Can’t escape these via lazy vs eager options • No reason for group replication • Master is the same (eager) or better (lazy) • So, what do we do • Avoid scale, keep systems small
Two-Tier Replication • Two node types: • Base nodes: Always connected, store replica, master most objects • Mobile nodes: often disconnected, store a replica, issues tentative transactions • Two version types: • Master version: • Exists at the object owner, other may have older versions • Tentative version: • Local version is updated by tentative transactions
System Principles • Hierarchies to reduce scale • Nodes (Master & Mobile-disconnected) • Transactions (Tentative and Eager/Consistent) • Techniques • Convergence (Bayou-like eventual consistency) • Idempotence: encode writes in non-conflicting ways • Does it fix any of Bayou’s semantic problems?
Conclusions • Eager: waits and deadlocks • Lazy converts waits and deadlocks into reconciliations • Both do not scale. • Two tier replication: • Supports mobile nodes • Combine eager-master-replication with local updates