Paradigms for Building Distributed Systems: Performance Measurements and Conditional Analysis

Paradigms for Building Distributed Systems:Performance Measurements and Conditional Analysis Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group

Outline • Motivation: application domain • Paradigms for building distributed applications • Typical performance measurements and studies • Conditional performance study • Examples • Group membership • QoS-preserving totally ordered multicast • Dynamic voting

Modern Distributed Applications (in WANs) • Highly available servers • Video-on-demand • Collaborative computing • Shared white-board, shared editor, etc. • Military command and control • On-line strategy games

Important Issues in Building Distributed Applications • Consistency of view • Same picture of game, same shared file • Fault tolerance, high availability • Performance • Conflicts with consistency? • Scalability • Topology - WAN, long unpredictable delays • Number of participants

Generic Primitives - Middleware, “Building Blocks” • E.g., total order, group communication • Abstract away difficulties, e.g., • Total order - a basis for replication • Mask failures • Important issues: • Well specified semantics - complete • Performance

Typical Performance Measurements • Measure • “Average” latency • Throughput • Run on idle machines, idle network, ... • to get meaningful, consistent results • to get meaningful comparison among different algorithms

Other Interesting Questions • When should we expect the system to behave as measured? • How much does it degrade at other times? • How fast does it converge to good behavior after a bad period? • Complement the answers we get from measurements

APP TO NET Typical Performance Study • “Expected” latency, throughput • Bundle up all cases? • Assume some distribution (e.g. exponential) • Q: How sensitive is the analysis to this assumption? • Q: How does this compose?

Conditional Analysis: Supplement to Measurements • Guaranteed behavior under certain conditions on the environment • Compare with measurements at ideal times • Understand interesting issues from measurements • Conditions are parameters • Understand how performance degrades • Study how fast performance converges to good behavior after a bad times • Wait before using probability • Composable! • Allows studying sensitivity to probability

Example 1: A Scalable Group Membership Algorithm for WANs Idit Keidar, Jeremy Sussman Keith Marzullo, Danny Dolev ICDCS 2000

Membership in WAN: the Challenge • Message latency is large and unpredictable • Time-out failure detection is inaccurate • We use a notification service (NS) for WANs • Number of communication rounds matters • Algorithms may change views frequently • View changes require communication for state transfer, which is costly in WAN

Algorithm Novel Concepts • Designed for WANs from the ground up • Avoids delivery of “obsolete” views • Views that are known to be changing • Not always terminating (but NS is) • How could measurements / analysis capture this benefit? • Runs in a single round “typically”(in-sync) • Three rounds in worst case (out-of-sync)

Measurements:End-to-end Latency: Scalable! • Member scalability: 4 servers (constant) • Server and member scalability: 4-14 servers

end NS memb msg time end NS Interesting Questions (Future) • How typical is the “typical case”? • Depends on NS • Understanding costs over NS costs • Measurements show: when NS takes more time at some process, membership algorithm works in “pipeline” to save time

The QoS Challenge • Some distributed applications require QoS • Guaranteed available bandwidth • Bounded delay, bounded jitter • Membership algorithm terminates in one round under certain circumstances • Can we leverage on that to guarantee QoS under certain assumptions? • Can other primitives guarantee QoS?

“The requirements of resilience and scalability dictate that total consistency of view is not possible unless mechanisms requiring unacceptable delays are employed” Jon Crowcroft, Internetworking Multimedia, 1999

QoS Preserving Totally Ordered Multicast Ziv Bar-Joseph, Idit Keidar, Tal Anker, Nancy Lynch 2000

QoS Preserving Totally Ordered Multicast - Motivation • Total order - building block for replication • Applications: • On-line strategy games, shared text editing, etc. • Need predictable delays but also consistency • Fault tolerance • Not always too costly!

The Model (VBR) • Allows for some bursty traffic • Slot size , per application • Tunable • Message loss handled by FEC • Analysis due to [Bartal, Byers, Luby, Raz] • Processes can fail, recover • Clocks synchronized within 

Algorithm Overview: Fault Free Case • Deliver messages in each slot according to process identifier order and reported number of messages per slot • Example: •  is 100 milliseconds • a sends 5 in the slot • b sends 2 in the slot The order inside the slot is: a a a a ab b • Send dummy in empty slots • E.g., deliver: (dummy-from-a)b b

latency rate Algorithm QoS Guarantees: Fault Free Case • Maximum latency: ++ • Average rate: increased by at most 1/ • At most 1 dummy per slot • Only if sending rate drops below • Max burst: same as reserved by application • No dummy messages in full slots

Lower Bound on Maximum Latency with Process Faults • Reduce to Consensus (well-known) • Consensus lower bound: • f +1 rounds for tolerating f stopping failures • Lower bound on latency: (f +1) • linear!

Process Failures and Joins: Summary of Results • Total order with gaps • Gaps correspond to faulty processes • Latencyincreases to: +2+ - constant! even when processes join or fail • Reliable total order (work-in-progress) • Reason about QoS guarantees under certain assumptions on failure patterns (“clean” rounds)

Conclusions • Totally ordered multicast and QoS can co-exist in certain network models • Important to understand model, failure patterns, ... • Next step: implementation • Applications: shared text editor, on-line game, • See if analyzed cases are the “right” ones • A framework for analyzing QoS guarantees • Other examples will follow, e.g., other QoS parameters, other primitives

Availability Study of Dynamic Voting Algorithms Kyle Ingols and Idit Keidar 2000

Dynamic Voting - Defines Quorums Adaptively • Each “primary” is a majority of the previous one but not of all the universe of processes • Example: {1, 2, 3, 4, 5, 6, 7, 8, 9} {1, 2, 3, 4, 5} {2, 3, 4} {3, 4, 6, 7 10, 11} • Availability studied by stochastic analysis, simulations, empirical measurements,...

Previous Studies Ignored…. • The change from one “primary” to the next cannot be atomic in a distributed system • What happens if a failure occurs while the change is not complete? • Some suggested algorithms were wrong • Correct algorithms differ in handling this • How fast they recover • How many processes need to reconnect to recover • Can attempts to change primary be pipelined?

Our Study • Simulations • Multiple frequent connectivity changes • Then, stable period - see if primary exists • Observations: • Algorithms differ greatly in availability • especially in their degradation • Conclusion: analysis of any kind may fail to consider important cases...

Paradigms for Building Distributed Systems: Performance Measurements and Conditional Analysis