1 / 38

Zero-cost Reliability for Tree-based Overlay Networks

Zero-cost Reliability for Tree-based Overlay Networks. Dorian C. Arnold University of Wisconsin Paradyn/Dyninst Week March 21-22, 2006 College Park, MD. Overview. T ree- b ased O verlay N etworks (TBŌNs) Definition, examples, applications Prototype: www.paradyn.org/ mrnet

Download Presentation

Zero-cost Reliability for Tree-based Overlay Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Zero-cost Reliability forTree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Dyninst Week March 21-22, 2006 College Park, MD

  2. Overview • Tree-based Overlay Networks (TBŌNs) • Definition, examples, applications • Prototype: www.paradyn.org/mrnet • Zero-cost reliability • No overhead during normal operation • Applicable to many TBŌN computations Zero-cost TBŌN Reliability

  3. HPC Trends from . No Data Available November ’05processor count distribution. Growth in 1024-processor systems. Zero-cost TBŌN Reliability

  4. Hierarchical Distributed Systems • Hierarchical Topologies • Application Control • Data collection • Data reduction/analysis • As scale increases, front-end becomes a bottleneck FE … BE BE BE BE Zero-cost TBŌN Reliability

  5. TBŌNs for Scalable Systems TBŌNs for scalability • Scalable multicast • Scalable gather • Scalable data aggregation FE CP CP CP CP … BE BE BE BE Zero-cost TBŌN Reliability

  6. TBŌN Model Application Front-end FE CP CP Tree ofCommunication Processes CP CP … BE BE BE BE Application Back-ends Zero-cost TBŌN Reliability

  7. TBŌN Model Reliable FIFO channels • Non-lossy • Duplicate suppressing • Non-corrupting FE CP CP CP CP … BE BE BE BE Zero-cost TBŌN Reliability

  8. TBŌN Model FE Application-level packet CP CP Packet filter Filter state CP CP … BE BE BE BE Channel state Zero-cost TBŌN Reliability

  9. TBŌN Model Filter function: • Inputs a packet from each child • Outputs a single packet • Updates filter state {output, new_state } ≡ f ( inputs, cur_state ) Zero-cost TBŌN Reliability

  10. TBŌNs at Work • Multicast • RMX [Chawathe, McCanne and Brewer ’00] • End System Multicast [Chu, Rao, Seshan and Zhang ’02] • Overcast [Jannotti, Gifford, Johnson, Kaashoek and O’Toole ’00] • ALMI [Pendarakis, Shi, Verma and Waldvogel ’01] • Multicast/gather (reduction) • Gathercast [Badrinath and Sudame ’00] • Ygdrasil [Balle, Brett, Chen, LaFrance-Linden ’02] • Lilith [Evensky, Gentile, Camp, and Armstrong ’97] • MRNet [Roth, Arnold and Miller ‘03] • Bistro (no reduction) [Bhattacharjee et al ’00] • Distributed monitoring/sensing • TAG (reduction) [Madden, Franklin, Hellerstein and Hong ’02] • Ganglia [Sacerdoti, Katz, Massie, Culler ’03] • Supermon (reduction) [Sottile and Minnich ’02] Zero-cost TBŌN Reliability

  11. Example TBŌN Reductions • Simple • Min, max, sum, count, average • Concatenate • Found in the general-purpose infrastructures • Complex • Clock synchronization [ Roth, Arnold, Miller ’03] • Time-aligned data aggregation [ Roth, Arnold, Miller ’03] • Graph merging [Roth, Miller ’05] • Mean-shift image segmentation [Arnold, Pack, Miller ’05] • Equivalence relations Zero-cost TBŌN Reliability

  12. Potential TBŌN Operations Many aggregations reduce toequivalence class computations FE CP CP System Inputs: Trace file, live data streams,graph structures, … CP CP … BE BE BE BE Zero-cost TBŌN Reliability

  13. Potential TBŌN Operations Many aggregations reduce toequivalence class computations FE CP CP Data Reduction: Equivalence relation forclassification, anomalydetection, graph merging CP CP … BE BE BE BE Zero-cost TBŌN Reliability

  14. Potential TBŌN Operations Many aggregations reduce toequivalence class computations FE CP CP Reduction Output: Equivalence classes,anomaly data, merged graphs CP CP … BE BE BE BE Zero-cost TBŌN Reliability

  15. TBŌN Reliability 1 System Size MTTF  Given the emergence of TBŌNs forscalable computing, low-costreliability for TBŌN environmentsbecomes critical! Zero-cost TBŌN Reliability

  16. TBŌN Reliability • Goal • Tolerate process failures • Avoid checkpoint overhead • Concept: leverage TBŌN properties • Natural information redundancies • Computational semantics • Lost state may be replaced by non-identical state • Computational consistency: relaxed models • Zero-cost: no computation, storage or network overhead during normal operation • Define operations that compensate for lost state • Maintain computational consistency Zero-cost TBŌN Reliability

  17. Fundamental to the TBŌN Model Input streams propagate toward root Persistent state summarizes input history Therefore, summary is replicated naturally as input propagates upstream TBŌN Information Redundancies Zero-cost TBŌN Reliability

  18. Recovery Strategy • if failure is detected then • Reconstruct tree • Regenerate compensatory state • Reintegrate state into tree • Resume normal operation • end if Zero-cost TBŌN Reliability

  19. State Regeneration: Composition fs( CPi ) CPi State at parent is compositionof states at children CPk CPj fs( CPk ) fs( CPj ) Zero-cost TBŌN Reliability

  20. CompositionOperator Parent’s state Child’s state Child’s state State Regeneration: Composition fs( CPi ) ≡ fs( CPj )  fs( CPk ) State composition: • Input filter state from children • Output computationally-consistent filter state for parent Zero-cost TBŌN Reliability

  21. State Regeneration: Composition Where does this mysterious composition operation come from? Recall filter definition: {output, new_state } ≡ f (inputs, cur_state ) When filter’s new_state is copy of output;then fbecomes composition operator. Zero-cost TBŌN Reliability

  22. State Regeneration: Composition Proof Outline • State is history of processed inputs • Children’s output becomes parent’s input • Updated state is a copy of output • can be used as input to filter function • Filter execution on children’s state will produce computationally consistent state for parent Zero-cost TBŌN Reliability

  23. State Regeneration: Composition Composition can also work when output is not a copy of the state! • Requires mapping operation from filter state to output form Zero-cost TBŌN Reliability

  24. State Composition Example { } CP0 { } { } CP2 CP1 CP3 CP4 CP5 CP6 3 1 1 1 4 5 5 8 3 3 1 9 1 4 1 5 Zero-cost TBŌN Reliability

  25. State Composition Example { } CP0 { } { } CP2 CP1 3 1 1 1 CP3 CP4 CP5 CP6 4 5 5 8 3 3 1 9 1 4 1 5 Zero-cost TBŌN Reliability

  26. State Composition Example { } CP0 {1,3} {1} {1,3} {1} CP2 CP1 4 5 5 8 CP3 CP4 CP5 CP6 3 3 1 9 1 4 1 5 Zero-cost TBŌN Reliability

  27. State Composition Example {1,3} {1,3} CP0 {1,3,4,5} {1,5,8} {1,3,4,5} {1,5,8} CP2 CP1 3 3 1 9 CP3 CP4 CP5 CP6 1 4 1 5 Zero-cost TBŌN Reliability

  28. State Composition Example {1,3} {1,3,4,5,8} {1,3,4,5,8} CP0 {1,3,4,5} {1,5,8,9} {1,3,4,5} {1,5,8,9} CP2 CP1 1 4 1 5 CP3 CP4 CP5 CP6 Zero-cost TBŌN Reliability

  29. State Composition Example {1,3} {1,3,4,5,8} {1,3,4,5,8,9} {1,3,4,5,8,9} CP0 {1,3,4,5} {1,5,8,9} {1,3,4,5} {1,5,8,9} CP2 CP1 CP3 CP4 CP5 CP6 Zero-cost TBŌN Reliability

  30. State Composition Example {1,3} {1,3,4,5,8} {1,3,4,5,8,9} {1,3,4,5,8,9} {1,3,4,5,8,9} CP0 {1,3,4,5} {1,5,8,9} CP2 CP1 CP3 CP4 CP5 CP6 Zero-cost TBŌN Reliability

  31. State Composition Example {1,3} {1,3} CP0 crashes! CP0 {1,3,4,5} {1,5,8} {1,3,4,5} {1,5,8} CP2 CP1 3 3 1 9 CP3 CP4 CP5 CP6 1 4 1 5 Zero-cost TBŌN Reliability

  32. Use f on children’s state to regenerate computationally-consistent version of lost state State Composition Example {1,3} {1,3} CP0 {1,3,4,5} {1,5,8} {1,3,4,5} {1,5,8} CP2 CP1 3 3 1 9 CP3 CP4 CP5 CP6 1 4 1 5 fs( CP0 ) ≡ fs( CP1)fs( CP2 ) Zero-cost TBŌN Reliability

  33. State Composition Example Non-identical, but computationally-consistent! {1,3} {1,3} {1,3,4,5,8} {1,3} CP0 CP0 {1,3,4,5} {1,5,8} {1,3,4,5} {1,5,8} CP2 CP1 {1,3,4,5} {1,5,8} CP2 CP1 3 3 1 9 3 3 1 9 CP3 CP4 CP5 CP6 CP3 CP4 CP5 CP6 1 4 1 5 1 4 1 5 fs( CP0 ) ≡ fs( CP1 )  fs( CP2 ) Zero-cost TBŌN Reliability

  34. State Composition Example {1,3} {1,3} {1,3,4,5,8} {1,3,4,5,8} {1,3,4,5,8} CP0 CP0 {1,3,4,5} {1,5,8,9} {1,3,4,5} {1,5,8,9} {1,3,4,5} {1,5,8,9} CP2 CP1 {1,3,4,5} {1,5,8,9} CP2 CP1 1 4 1 5 1 4 1 5 CP3 CP4 CP5 CP6 CP3 CP4 CP5 CP6 Zero-cost TBŌN Reliability

  35. State Composition Example {1,3} {1,3} {1,3,4,5,8} {1,3,4,5,8,9} {1,3,4,5,8,9} {1,3,4,5,8,9} {1,3,4,5,8,9} CP0 CP0 {1,3,4,5} {1,5,8,9} {1,3,4,5} {1,5,8,9} {1,3,4,5} {1,5,8,9} CP2 CP1 {1,3,4,5} {1,5,8,9} CP2 CP1 CP3 CP4 CP5 CP6 CP3 CP4 CP5 CP6 Zero-cost TBŌN Reliability

  36. State Composition Example {1,3} {1,3} {1,3,4,5,8} {1,3,4,5,8,9} {1,3,4,5,8,9} {1,3,4,5,8,9} {1,3,4,5,8,9} {1,3,4,5,8,9} {1,3,4,5,8,9} CP0 CP0 {1,3,4,5} {1,5,8,9} CP2 CP1 {1,3,4,5} {1,5,8,9} CP2 CP1 CP3 CP4 CP5 CP6 CP3 CP4 CP5 CP6 Zero-cost TBŌN Reliability

  37. Summary • Zero-cost TBŌN reliability constraints • Filter state and output have same representation, or • Known mapping from filter state representation to output form • Filter function used for regeneration • Many computations meet requirements Zero-cost TBŌN Reliability

  38. Other Research Issues • Compensating for lost messages • Use computational state to compensate • Idempotent/non-idempotent computations • Other state regeneration mechanisms • Decomposition • Failure detection • Tree reconstruction • Evaluation of the recovery process Zero-cost TBŌN Reliability

More Related