360 likes | 458 Views
Asynchronous rebalancing of a replicated tree. Marek Zawirski INRIA & UPMC, France Marc Shapiro INRIA & LIP6, France Nuno Preguiça UNL, Portugal CFS E , May 2011, Saint-Malo. Summary. Overview of Treedoc: Abstractly, a lways- responsive replicated sequence
E N D
Asynchronous rebalancingof a replicated tree MarekZawirskiINRIA & UPMC, France Marc Shapiro INRIA & LIP6, France Nuno Preguiça UNL, Portugal CFSE, May 2011, Saint-Malo
Summary • Overview of Treedoc: • Abstractly, always-responsivereplicated sequence • Built as a replicated ordering tree • Problem faced: • Tree unbalance • Solution for asynchronous tree rebalance • Algorithm requirements statement • Novel F-translate algorithm Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Treedoc – a replicated sequence replica2 replica1 I I 0 1 0 1 • Replicated representation: • Grow-only binary tree • Stable, unique position ids L P L P Total order ”<”: infix traversal =1 P • Sequence of atoms: • Ops: read, addAt, removeAt L I P replica3 I 0 1 L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Treedoc – a replicated sequence replica2 replica1 I I 0 1 0 1 L P L P L I P replica3 I S 0 1 addAt(, S) < < S I S P L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Treedoc – a replicated sequence replica2 replica1 I I 0 1 0 1 L P L P 0 S L I S P replica3 I 0 1 addAt(, S) =10 < < S S I S P L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Treedoc – a replicated sequence replica2 replica1 I I 0 1 0 1 L P L P 0 S X L I S P replica3 I 0 1 addAt(, S) S L P removeAt( ) P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Treedoc – a replicated sequence replica2 replica1 I I 0 1 0 1 L P L P 0 S L I S replica3 I 0 1 addAt(, S) S L P removeAt( ) P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Treedoc – a replicated sequence replica2 replica1 • Operation-based replication: • Immediate local execution • Propagate (cbcast) & replay I I 0 1 0 1 L P L P 0 0 S S L I S replica3 I 0 1 addAt(, S) addAt(, S) addAt(, S) S S S L P 0 removeAt( ) P S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Treedoc – a replicated sequence replica2 replica1 • Operation-based replication: • Immediate local execution • Propagate (cbcast) & replay I I 0 1 0 1 L P L P 0 0 S S L I S replica3 I 0 1 addAt(, S) S L P 0 removeAt( ) removeAt( ) removeAt( ) P P P S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Treedoc – a replicated sequence replica2 replica1 • Operation-based replication: • Immediate local execution • Propagate (cbcast) & replay I I 0 1 0 1 L P L P 0 0 0 0 E S A S addAt(, A) addAt(, A) A A E L I S replica3 addAt(, E) I E 0 1 L P ? 0 S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Treedoc – a replicated sequence replica2 replica1 • Operation-based replication: • Immediate local execution • Propagate (cbcast) & replay I I 0 1 0 1 L P L P 0 0 0 0 S A S E A addAt(, A) addAt(, A) addAt(, A) A A A A E L I S replica3 addAt(, E) addAt(, E) addAt(, E) I E E E 0 1 L P 0 Predefined order:red <green< blue… S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Treedoc – a replicated sequence replica2 replica1 • Operation-based replication: • Immediate local execution • Propagate (cbcast) & replay • Concurrent commute • Eventually consistent I I 0 1 0 1 L P L P 0 0 0 0 S S E A E A A E L I S replica3 I 0 1 L P 0 0 Predefined order:red <green< blue… S E A [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
The tree rebalance problem • With time tree gets worse and worse • Unbalanced, empty nodes, lot of colors… • Various negative impacts I 0 1 rebalance L P 0 0 S E A • Tree rebalance: • Create minimal tree from nonempty nodes • Keep order “<“ • Use single color (white) • New ids (rectangles), incompatible with old L 0 1 • Challenge: • How to avoid system-wide consensus? E S 0 1 Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree A I
The core-nebula architecture Idea: limit consensus to a smaller number of replicas Divide replicas into two disjoint sets: [Lețiaet. al, 2009] CORE • a stable group • execute tree operations& agree on rebalance • easier agreement NEBULA • sites join & leave, dynamic • generate tree operations • learns about rebalance • perform catch-upprotocolto integrate conc. changes • never blocked Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L P P L P P L P P L P P 1 1 1 1 S S S S addAt(, S) addAt(, S) addAt(, S) addAt(, S) S S S S • Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L P P L P P L P P L P P 1 1 1 1 S S S S • Any pair of replicas can exchange operations in the same epoch removeAt( ) removeAt( ) removeAt( ) removeAt( ) P P P P Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 1 1 1 S S S S • Any pair of replicas can exchange operations in the same epoch removeAt( ) I Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U addAt(, P) addAt(, P) addAt(, U) addAt(, U) final state U P U P • Any pair of replicas can exchange operations in the same epoch • rebalance@coreinitiates new epoch • rebalance@coreand operations@coreinherently concurrent to ops@nebula! S 0 L 0 T addAt(, T) T Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U addAt(, U) addAt(, P) addAt(, P) final state P U P X • Pairwise catch-up movesnebula replica to the next epoch S ? 0 L 0 T addAt(, T) T Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state • Pairwise catch-up movesnebula replica to the next epoch • replay core ops until finalstate S 0 L removeAt( ) 0 I T Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state catch-up • Pairwise catch-up movesnebula replica to the next epoch • replay core ops until finalstate • replay rebalance on final nodes • translate nodes of nebula operations || rebalanceinto the new tree S S 0 0 L L 0 1 ? T P Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state catch-up S S • Naive translation algorithm: Create new position respecting old order observed at the nebula replica 0 0 L L 0 1 < < ~[Lețiaet. al, 2009] L P S T P < < S L P Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state catch-up S S 0 0 L L removeAt( ) 0 I 1 T P Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state catch-up catch-up S S S 0 0 0 L L L 0 1 1 T P U < < L U S Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state catch-up catch-up S S S 0 0 0 L L L 0 1 1 T P U addAt(, P) addAt(, P) P U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state catch-up catch-up Order observed at nebula2 broken! S S S 0 0 0 L L L < 0 U P 1 1 < P P U U T P U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Towards correct translate: requirements • Order-preserving • For every , the order is preserved between epochs: < => < • Deterministic • For every , nebulai, nebulaj, is translated identically: @nebulai = @nebulaj • Non-disruptive • For every created by addAtand created by translate: != Solution: designate all cases in advance using final state only! X Y X Y Y X X X X X X Y X Y Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
F-Translatealgorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state • Sentinel position : • Designated position for every potential translate • < < … < < • Materialized on translate S Xi S 0 0 L L S1 0 X X2 X1 Y Xim T L1 L2 L3 L4 Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
F-Translatealgorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state F-Translate: • Right child of final node S S 0 0 L L S1 0 T L1 L1 L2 L3 L4 1 U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
F-Translatealgorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state F-Translate: • Child of empty f. node • Etc. S S S 0 0 0 L L S1 L S1 0 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 0 1 P U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
F-Translatealgorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state S S S 0 0 0 L L S1 L S1 0 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 removeAt( ) I 0 1 P U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
F-Translatealgorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state S S S S 0 0 0 0 L L S1 L S1 L S1 0 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 0 1 0 1 P U P U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
F-Translatealgorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state S S S S 0 0 0 0 L L S1 L S1 L S1 0 0 0 0 T T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 L1 L1 L2 L3 L3 L4 1 0 1 0 0 1 1 0 P U U P P U P U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
F-Translate: sentinel implementation • How to implement sentinelAfter? • TODO: REPLCACE w/figure!!! • show if time allows & audience is not lost • Empty nodes are necessary! • Is more empty nodes discarded than introduced?Yes, but… • How to minimize empty nodes in sentinelAfter • Using balanced binary tree! • Define addAt such that it avoids empty nodes, long paths • Two-round empty node discard X O(1) number of empty nodes / path + O(log imax) / path Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Summary • Problem faced: • Tree unbalance • Solution for asynchronous tree rebalance • Algorithm requirements statement • Novel F-translate algorithm: • Identify and utilize final state prior to rebalance • Use sentinel positions • Prototype catch-up implementation • Future work? • Evaluation of sentinelAfterimplementation • Formal order-preservation proof Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree
Appendix: the unbalance problem • Use sparse tree and heuristic to assign PosID[Weiss et. al, ‘09] • Tested on Wikipedia traces; works at the cost of possible anomaly • Use list instead of a tree [Roh et. al, ‘10] • Different costs and convergence characteristics? • Rebalance the tree [Shapiro, Preguiça et. al, ‘09] • System-wide consensus;inherent limitations • The core-nebula idea [Lețiaet. al ‘09]; incorrect translation • This work brings: • More formalization of the core-nebula for asynchronous systems • Flaws revealed in naive algorithms • Translation requirements statement • Novel F-translate algorithm • First prototype implementation in Java (subject to further studies) Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree