1 / 30

Emergent (Mis)behavior vs. Complex Software Systems

Emergent (Mis)behavior vs. Complex Software Systems. Jeff Mogul HP Labs – Palo Alto April 2006. Emergent behavior?. Ants are dumb Anthills are “smart” The global behavior of the anthill emerges from the local behaviors of the ants

perrin
Download Presentation

Emergent (Mis)behavior vs. Complex Software Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Emergent (Mis)behavior vs. Complex Software Systems Jeff Mogul HP Labs – Palo Alto April 2006

  2. Emergent behavior? • Ants are dumb • Anthills are “smart” • The global behavior of the anthill emerges from the local behaviors of the ants • The individual ants don’t know what the global behavior is supposed to be Emergent (Mis)behavior vs. Complex Software Systems

  3. Opening day on theMillennium Footbridge • Opening day (10 June 2000): • “unexpected lateral vibrations occured” • “a significant number of pedestrians [had] difficulty walking” • The bridge was closed; the engineers got back to work • They had already done very careful modelling of a novel design • What went wrong? • People on a swaying surface tend to synchronize their footsteps to the swaying, even if initial amplitude is small • Bridge’s natural frequency was close to normal footsteps • This effect was unknown in engineering literature • Novel bridge design + unusual pedestrian-only load • Once the problem was understood, modelling and retrofit were fairly straightforward Emergent (Mis)behavior vs. Complex Software Systems

  4. Why is that bridge interesting to us? • People have been designing bridges for millennia • Civil engineering is a well-regulated profession • Lots of experience with unexpected dynamic failures • Lots of computer modelling expertise • But the engineers still got it wrong: why? • Answer: emergent misbehavior • The system’s behavior emerged – it wasn’t easy to predict • Particularly, not from understanding of individual “parts” • And the result was unexpected and bad • If these engineers got it wrong, what about us? • Computer systems are worse than bridges! Emergent (Mis)behavior vs. Complex Software Systems

  5. The importance of emergent misbehavior in computer systems Much past focus has been on: • Fault-tolerant systems • Correctness-by-construction Both are valuable, but … • System-wide failures not always caused by “faults” • Modern systems are too complex to understand • Performancematters! All three issues can result from emergent misbehavior Goals of this talk: • Illustrate the scope and nature of the problem • Propose a research agenda Emergent (Mis)behavior vs. Complex Software Systems

  6. What this talk is NOT about • Dealing with malicious behavior • Game theory and incentives for people • Telling anyone that their approach is wrong • We still need fault tolerance, program verification, correct-by-construction techniques, etc.! • Improving peak (best-case) system performance This talk is 100% uncontaminated by: • Implementation or architecture • Experiments or results Emergent (Mis)behavior vs. Complex Software Systems

  7. Outline • Examples • What is/is not “emergent misbehavior”? • A research agenda • Thoughts about visions of the future • Related work Emergent (Mis)behavior vs. Complex Software Systems

  8. Examples of emergent misbehavior Examples can be found in: • Non-computer technology • Millennium Footbridge (London); Traffic jams • Computer hardware • Vibrations in large disk arrays • Networking • Ethernet capture effect, Router synchronization; BGP Route flap damping; TCP’s Nagle algorithm • Distributed systems and operating systems • Misconfigured load balancer; Herd behavior; Priority inversion in the Mars Pathfinder Emergent (Mis)behavior vs. Complex Software Systems

  9. Examples of emergent misbehavior Examples described in this talk: • Non-computer technology • Millennium Footbridge (London); Traffic jams • Computer hardware • Vibrations in large disk arrays • Networking • Ethernet capture effect, Router synchronization; BGP Route flap damping; TCP’s Nagle algorithm • Distributed systems and operating systems • Misconfigured load balancer; Herd behavior; Priority inversion in the Mars Pathfinder Emergent (Mis)behavior vs. Complex Software Systems

  10. Ethernet Capture Effect:an example scenario Assume both hosts have full transmit queues Host A, count = 1, flips “backoff coin” = 0 Host A decides to transmit Host A decides to transmit Host A, count = 1, flips “backoff coin” = 0 Host A wins, transmits Host A wins, transmits Idle Idle … ad infinitum Host B, count = 1, flips “backoff coin” = 1 Host B decides to transmit Host B decides to transmit Host B, count = 2, flips “backoff coin” = 01 B’s disadvantage doubles on each round Emergent (Mis)behavior vs. Complex Software Systems

  11. Ethernet Capture Effect (II) • No component here has failed • Problem didn’t show up until chips met the spec • Older chips were too slow to send back-to-back packets • The extra delay left B a chance to sneak in • Apparently was not caught in original modelling • Problem doesn’t require large scale to show up • In fact, adding more hosts tends to blur the picture • Solution involved adding extra delay • “Don’t send back-to-back if you just won a collision” • [Ramakrishnan and Yang, 1994] Emergent (Mis)behavior vs. Complex Software Systems

  12. Herd behavior in a distributed system • Planetary-Scale Event Prop & Routing System • (a.k.a. PsEPR) [Brett et al., WORLDS 2005] • Runs on PlanetLab • Aims for very large scale • Requires clients to be distributed evenly among servers • Clients keep ordered preference lists of servers • Prefer “nearby” servers (based on all-pairs-ping) • On server failure: • Demote failed server • Try to connect to top server on list Emergent (Mis)behavior vs. Complex Software Systems

  13. PsEPR system structures Desirable Undesirable Emergent (Mis)behavior vs. Complex Software Systems

  14. Herd behavior in a distributed system:what went wrong with PsEPR • Initially, clients generally balanced among servers • As servers/links failed: • Same servers tended to look bad to most clients • So, client preference lists tended to converge • So, clients tended to connect to a small subset of servers • Clients mostly converged on a few servers: • These servers became overloaded • Server-local response-time monitors caused restarts • Causing further convergence of client preference lists • Clients all moved to the next server on their list • At rate governed by server restart times • Fix: adjust ordering by success count + random # Emergent (Mis)behavior vs. Complex Software Systems

  15. Outline • Examples • What is/is not “emergent misbehavior”? • A research agenda • Thoughts about visions of the future • Related work Emergent (Mis)behavior vs. Complex Software Systems

  16. One definition of emergent behavior Emergent behavior is that which cannot be predicted through analysis at any level simpler than that of the system as a whole. • George Dyson (1998) • Emergent misbehavior is just emergent behavior that we don’t want Emergent (Mis)behavior vs. Complex Software Systems

  17. Distinguishing betweenemergent and “normal” misbehavior • Misbehavior that is not emergent: • Single-component bugs that break the whole system • Inherently inefficient algorithms • Insufficient resources • Much work on computer systems reliability • Focuses on handling faults • Aims for “correct by construction” • Emergent misbehavior tends to be: • Global misbehavior arising from “correct” local behaviors • Related to the composition of independent parts • Related to delays and to decentralized control • It might not ever be possible to be definitive Emergent (Mis)behavior vs. Complex Software Systems

  18. Outline • Examples • What is/is not “emergent misbehavior”? • A research agenda • Thoughts about visions of the future • Related work Emergent (Mis)behavior vs. Complex Software Systems

  19. Outline of a proposedresearch agenda • Create a taxonomy of emergent misbehaviors • To guide the rest of the agenda • Create a taxonomy of frequent causes • Generalize when possible; tie back to taxonomy #1 • Develop detection and diagnosis techniques • Look for distinctive signatures from taxonomies • Develop prediction techniques • For better prediction of performance and failures • Develop amelioration techniques • System design tricks to avoid emergent misbehavior • Develop testing techniques • Strategies for smoking out emergent misbehavior during testing Emergent (Mis)behavior vs. Complex Software Systems

  20. Taxonomy #1:kinds of emergent misbehavior • Thrashing • Unwanted synchronization • Unwanted oscillation or periodicity • Deadlock • Livelock • Phase change • Chaotic behavior • etc. Emergent (Mis)behavior vs. Complex Software Systems

  21. Taxonomy #2:Frequent causes of emergent misbehavior • Unexpected resource sharing • Massive scale • Decentralized control • Lack of composability • Misconfiguration • Unexpected inputs or loads • Communication delay • etc. Emergent (Mis)behavior vs. Complex Software Systems

  22. There’s a lot more work to do! • A little more discussion in the paper … • Hopefully, a few dissertations, from people with more energy than I have. Emergent (Mis)behavior vs. Complex Software Systems

  23. Outline • Examples • What is/is not “emergent misbehavior”? • A research agenda • Thoughts about visions of the future • Related work Emergent (Mis)behavior vs. Complex Software Systems

  24. Visions of the future(large-scale and enterprise systems) • Automatic control of data centers and services • Beyond “lights out” to “minimal human involvement” • Feedback control of almost everything • Service-oriented computing • Construction by composition of “services” • Correctness by construction • Loose coupling via networks • Declarative approaches • “Models” for components and their composition Emergent (Mis)behavior vs. Complex Software Systems

  25. Visions of the future:ignoring emergent misbehavior? • Automatic control of data centers and services • Feedback loops can lead to surprises • Especially when several loops are working at cross purposes • Service-oriented computing • Composition of dynamic behaviors could yield surprises • Loose coupling via networks: adds latency • Declarative approaches • Rule-based systems are hard to debug • Less explicit control over dynamics than procedural style? Emergent (Mis)behavior vs. Complex Software Systems

  26. Outline • Examples • What is/is not “emergent misbehavior”? • A research agenda • Thoughts about visions of the future • Related work Emergent (Mis)behavior vs. Complex Software Systems

  27. Related work • Lots of related work on good side of emergence • E.g.: Dyson, Darwin Among the Machines (1998) • Non-computer work on misbehavior: • Parunak & VanderBok (1997) • “Managing emergent behavior in distributed control systems” • Computer systems work on emergent misbehavior: • Term first(?) used by Ed Nisley (Dr. Dobb’s J., 2004) • Steven Gribble (HotOS, 2001) • Making systems more robust in the face of the unexpected • National Research Council report: A Research Agenda for Networked Systems of Embedded Computers (2001) Emergent (Mis)behavior vs. Complex Software Systems

  28. Summary • We’ve already seen lots of emergent misbehavior • Trends could make things worse in the future • CS research on reliability has focussed on faults • We need to understand emergent misbehavior • We needs ways to cope with it • A lot more detail in the paper Emergent (Mis)behavior vs. Complex Software Systems

  29. Advice for OSDI Authors • There will be no extensions to the deadline • Papers that violate the format requirements will be rejected. Emergent (Mis)behavior vs. Complex Software Systems

More Related