470 likes | 606 Views
Reliability and State Machines in an Advanced Network Testbed. Mac Newbold School of Computing University of Utah MS Thesis Defense April 5, 2004 Advisor: Prof. Jay Lepreau. Distributed Systems. Distributed Systems are complex Many components Distributed across multiple systems
E N D
Reliability and State Machines in an Advanced Network Testbed Mac Newbold School of Computing University of Utah MS Thesis Defense April 5, 2004 Advisor: Prof. Jay Lepreau
Distributed Systems • Distributed Systems are complex • Many components • Distributed across multiple systems • Component failures are relatively common • But should not cause system breakdown • “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.” – Leslie Lamport, quoted in CACM, June 1992 Mac Newbold - MS Thesis Defense
Our Context: Emulab • Emulab is an advanced network testbed • Complex time- and space-shared system • System dynamically reconfigures nodes and network links to create “experiments” • Key architectural feature: Central Database • System uses DB for storage, communication • Complex system with many different scripts and programs on clients and servers Mac Newbold - MS Thesis Defense
Emulab Background • First prototype in April 2000 (10 nodes) • In production since Oct. 2000 (40 nodes) • Early versions weren’t perfect • Reliability problems • Experiments of limited size • Inefficient use of resources • Problem is becoming harder • 200 nodes, 400 remote, 2000 virtual nodes Mac Newbold - MS Thesis Defense
Four Key Challenges Emulab requirements: • Reliability • Scalability • Performance and Efficiency • Generality and Portability Mac Newbold - MS Thesis Defense
1. Reliability • Complex systems are hard to make reliable • Many sources of unreliability: • Hardware – commodity PCs as nodes • Software – misconfiguration, bugs, etc. • Humans – can interrupt at any time • More complexity and more parts mean higher chance that something is broken at any given time Mac Newbold - MS Thesis Defense
2. Scalability • Almost everything a testbed provides is harder to provide at a larger scale • Larger scale requires more resources • If throughput doesn’t increase, things slow down at larger scale • Increased load adversely affects reliability • Practical scalability limited by reliability, performance and efficiency Mac Newbold - MS Thesis Defense
3. Performance and Efficiency • One direct requirement on performance: • Emulab is used in an interactive style • System tasks must complete in “a few minutes” • Indirect requirements: • Scalability requirement places high demands on many system components • Maximize efficient resource utilization • As many users/experiments as possible in the shortest time with the fewest resources Mac Newbold - MS Thesis Defense
4. Generality and Portability • Workload Generality • Wide variety of research, teaching, development, and testing activities • Good support for minimally and non-instrumented client OS’s and devices • Model generality • Evolving system • New types of network devices • Portable and non-intrusive client software Mac Newbold - MS Thesis Defense
Summary of Challenges • Three challenges are closely related • Fourth is a constraint more than a challenge • Reliability is key • Failure rates directly impact scalability, performance, and efficiency • Generality and portability requirements constrain any solution used to address the challenges Mac Newbold - MS Thesis Defense
Thesis Statement Enhancing Emulab with a flexible framework based on state machines provides better monitoring and control, and improves the reliability, scalability, performance, efficiency, and generality of the system. Mac Newbold - MS Thesis Defense
Outline • Introduction, Background, and Challenges • Thesis Statement • State Machines in Emulab • Interactions, Control Models, stated • Node Boot State Machines • Results • Related and Future Work • Conclusion Mac Newbold - MS Thesis Defense
State Machines • Also called Finite State Machines (FSMs), Finite State Automata (FSAs), or Statecharts in UML • Well-known model • Simple • Explicit model • Rich and flexible • Easy to understand and visualize Mac Newbold - MS Thesis Defense
Example State Machine • Three main parts: • States • Transitions • Directional • Events • Associated with transitions – labels • Stored in database • diagrams generated automatically from DB Mac Newbold - MS Thesis Defense
State Machines in Emulab • Each state machine has a “type” • Currently three: node boot, node allocation, and experiment status • Multiple machines allowed within a type • Only in one state in one machine of a type • States can have “timeouts” with actions • Timer starts when state is entered • State “triggers” – Entry Actions Mac Newbold - MS Thesis Defense
Direct Interaction • Within a type, can take a transition from a state in one machine (or “mode”) to a state in another machine of that type • Known as “mode transition” in Emulab • Similar to hierarchical state machines • Highlights similarities/symmetries • Most machines are variations of another • Improved code reuse Mac Newbold - MS Thesis Defense
Direct Interaction Example Mac Newbold - MS Thesis Defense
Models of State Machine Control • Centralized monitoring & control: stated • State changes submitted, checked for correctness, applicable actions performed • Daemon tracks timeouts • Used for Node Boot state machines • Distributed management of state machine • No central service enforcing correctness • No dependency on central service • Timeouts harder to implement Mac Newbold - MS Thesis Defense
The stated State Daemon • Listens for events continuously • State transitions cause database updates • Invalid transitions cause notifications • Timeouts, timeout actions, triggers configurable in DB for each state • Caching – only writer of node boot states • Modular design – dispatch events to proper action handlers Mac Newbold - MS Thesis Defense
Node Boot State Machines • Nodes in Emulab self-configure • Monitored via state machines – stated • “Normal” node boot machine • Variations – “Minimal” • Reloading node disks Mac Newbold - MS Thesis Defense
Node Self-Configuration • Nodes send state events during booting to allow progress to be monitored • “Global knowledge” inside state daemon • Better decisions about recovery steps • Finer granularity gives more information for recovery, allows for shorter timeouts • Each OS image is associated with a “mode” (state machine) that describes its behavior Mac Newbold - MS Thesis Defense
“Normal” Node Boot Machine • Start in SHUTDOWN • DHCP, start OS booting • When Emulab-specific configuration begins, enter TBSETUP • ISUP when finished • In case of failure, can retry from SHUTDOWN Mac Newbold - MS Thesis Defense
Variations of Node Boot • Example: MINIMAL • For OS images with little or no special Emulab support • ISUP generated by stated if necessary • Immediate or ping • SilentReboot allowed in this mode Mac Newbold - MS Thesis Defense
Reloading Node Disks • Mode transition into RELOAD / SHUTDOWN • RELOADDONE transitions into mode for newly-installed OS image Mac Newbold - MS Thesis Defense
Reloading and Mode Transitions Mac Newbold - MS Thesis Defense
Experiment Status State Machine • Uses distributed model • Stored in database, but not strictly enforced • Documents life-cycle • Restricts user interruption • Reduces a source of errors • Can queue, activate, modify, restart, swap, or terminate an expt. Mac Newbold - MS Thesis Defense
Node Allocation State Machine • Distributed control model • Diagram documents the way the states are used by the program, but not currently enforced • Either reloads nodes with a custom image, or reboots them as members of the experiment Mac Newbold - MS Thesis Defense
Results: Context • Emulab in production 1 year before state machines were added • In production 3 years since first stated • 650 users, 150 projects, 75 institutions • 19 papers, top venues • Over 155,000 nodes allocated in nearly 10,000 experiment instances • 13 classes at 10 universities • Emulab SW on 6 more testbeds, 4 planned Mac Newbold - MS Thesis Defense
Results • Anecdotal: (others in thesis) • Reliability/Performance: Preventing race conditions • Generality: Graceful handling of custom OS images • Generality: New node types • Experiment: • Reliability/Scalability: Improved timeout mechanisms Mac Newbold - MS Thesis Defense
Reliability/Performance: Preventing Race Conditions • Expt. ends, nodes move to holding expt., get reloaded, then freed while they boot • Problem: Getting allocated while booting • Node appears unresponsive, gets forcefully power cycled, corrupts FS on disk • Solution: don’t free immediately • Add trigger on next ISUP for a node that finishes booting, that frees it when booted Mac Newbold - MS Thesis Defense
Generality: Graceful Handlingof Custom OS Images • Users create custom OS images • Emulab client software is optional • Problem: Nodes don’t send state events • Solution: “Minimal” state machine • SHUTDOWN: maybe on server, optional • BOOTING: server side, trigger checks ISUP • ISUP: either node sends, or generated when pingable, or generated immediately Mac Newbold - MS Thesis Defense
Generality: New Node Types • Emulab is always growing and changing • State machine model and our framework are flexible to provide graceful evolution • We’ve added 5 new node types • IXPs, wide-area, PlanetLab, vnodes, sim-nodes • Mostly used existing machines • 2 new machines, slight variations • 1 change to stated to add a new trigger Mac Newbold - MS Thesis Defense
Reliability/Scalability: Improved Timeout Mechanisms • Before: reboot node, wait for it to boot • Static, 7 minute timeout • Pragmatic – minimizes false positives/negatives • Avg. 4 min., but max. error-free boot is 15 min. • 11 minute delay is too long • Improved: state machine monitoring • Fine-grained, context-sensitive timeouts • Faster error detection • Better monitoring and control Mac Newbold - MS Thesis Defense
Reliability/Scalability:Improved Timeout Mechanisms • Experiment: Measure expt. swap-in time, with and without the improvements • Synthetic but plausible scenario • One node, loads an RPM (8 min. install) • Node reboots, timeout during RPM install • Reboots again, timeout again, mark node dead • Try twice per swap-in, 3 swap-in attempts • Total failure in 45 min., 3 nodes “dead” Mac Newbold - MS Thesis Defense
Reliability/Scalability:Improved Timeout Mechanisms • With state machines: • Timeouts: SHUTDOWN 2 min, BOOTING 3 min, TBSETUP 10 min • Node reboots, enters BOOTING • 1 minute: Enters TBSETUP • 9 minutes: Enters ISUP, expt. ready • Succeeds, with no dead nodes or retries • Cut time from 45 min. to 9 min. (80%) Mac Newbold - MS Thesis Defense
Limitations and Issues • stated is critical infrastructure • Another single point of failure • More system complexity, new bugs, complicated debugging • Potential for scaling problems (none seen yet) • Simple heuristics for error detection • Send mail for invalid transitions Mac Newbold - MS Thesis Defense
Summary of Results • Explicit model requires careful thought • Improves design and implementation • Visualization makes it easier to understand • Faster and more accurate error detection • Better reliability helps scalability/efficiency • Bigger expts. possible, less overhead per expt. • Flexibility for evolution, workload generality Mac Newbold - MS Thesis Defense
Related Work • “Standard” Finite State Automata – basics • Timed Automata – have global clock • Message Sequence Charts (MSCs) • “Scenarios” – hierarchy, like modes/machines • UML Statecharts • States have entry actions – “triggers” • Hierarchical states – similar to modes • Can model Emulab’s timeouts Mac Newbold - MS Thesis Defense
Future Work • Further developing distributed control • Add monitoring, timeouts, triggers • Better heuristics for error detection • Only flag clustered or related errors • Implement more ideas from other systems • UML’s exit actions, guarded transitions, etc. • Move code into database – i.e. triggers • Easier to modify, framework code vs. machine Mac Newbold - MS Thesis Defense
Conclusion Enhancing Emulab with a flexible framework based on state machines provides better monitoring and control, and improves the reliability, scalability, performance, efficiency, and generality of the system. Mac Newbold - MS Thesis Defense
Bonus Slides Mac Newbold - MS Thesis Defense
Demonstrating Improvement • Currently: programs have their own retry and timeout mechanisms for node reboots • No knowledge of progress, just completion • Can cause failures by forcing a reboot, which can damage file systems on node disk • “New Way”: stated handles timeout and retry during rebooting • Implemented, not installed • Knows if progress is being made • Programs simply wait for ISUP or failure event Mac Newbold - MS Thesis Defense
Demonstrating Improvement (cont’d) • These failures directly hurt reliability • Node failure can cause experiment setup to fail • Significant impact on scaling, performance, efficiency • Maximum experiment size is limited by node failure rate • Failures make things take longer • A slower system means less efficient use of resources Mac Newbold - MS Thesis Defense
Demonstrating Improvement (cont’d) • Compare current vs. new: failure rate, time to completion, etc. • Test data; one of: • Historical experiments • Artificially high load • Fault injection, e.g. reboots • Why new way should help: • Better knowledge for intelligent recovery • Know when to wait longer and when to retry • Shorter timeouts allow for early error detection Mac Newbold - MS Thesis Defense
Future Work:Modeling Indirect Interaction • Occur between machines of different types • Due to external relationships between the entities tracked by each type of machine • Examples: • Same entity may be tracked in two different types of machine • Nodes are in Boot and Allocation machines • Other relationship between entities • Nodes may be “owned” or allocated to an experiment – links Expt. Status and Node machines Mac Newbold - MS Thesis Defense
Question and Answer Mac Newbold - MS Thesis Defense
Conclusion Enhancing Emulab with a flexible framework based on state machines provides better monitoring and control, and improves the reliability, scalability, performance, efficiency, and generality of the system. Mac Newbold - MS Thesis Defense