1 / 36

Bristlecone: A Language for Robust Software Systems

Bristlecone is a language designed to address current software instability issues by decoupling operations, specifying task dependencies, and using transactions for error recovery. It enhances robustness and enables automated recovery from critical failures. Learn about the web server example, object tagging, and task specifications in Bristlecone language.

luellag
Download Presentation

Bristlecone: A Language for Robust Software Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bristlecone: A Language for Robust Software Systems Brian Demsky Alokika Dash University of California, Irvine

  2. Current Software is All or Nothing • Most current software either executes perfectly or fails completely • Small errors cause catastrophic failures • Violate fundamental developer assumptions • Violated assumptions prevent continued execution • No clean way to recover from errors • Unclear what parts of the program are affected • Failure may leave key data structures partially updated

  3. Degraded Service Can Be Desirable • Consider a bug that affects an embedded application in a single web page • Current browsers often close all browser windows and exit • Users find this behavior frustrating • Better option is to isolate the failure and halt only the embedded application component

  4. Request r=receiverequest(); logRequest(r); processRequest(r); Motivating Example: Web Server

  5. Request r=receiverequest(); logRequest(r); processRequest(r); Motivating Example: Web Server • Failure in log operation • Prevents serving this request • If logging failure is independent of request, potentially causes system to fail to serve any requests CRASH

  6. Real World Example • First Flight of the Ariane 5 Rocket • Uncaught integer overflow in computation that computed horizontal bias • Overflow shutdown the inertial reference system • Inertial reference sent debug information to the guidance system • Guidance system used these invalid values to set incorrect nozzle deflections • $120 Million rocket crashed • Horizontal bias value is not even used! • Lesson: Critical system operations coupled to non-critical operations

  7. Observations about Recovery • Challenging to recover from failure with traditional program structures • Unclear what code was doing • Are data structures consistent? • What depends on the failed code? What is still safe to do? • Code structure introduces artificial dependences • In the absence of precise dependence information, we must assume the worst case • Failures can propagate through artificial dependences  small errors can cause catastrophic failures

  8. Where do we lose information? • Specifications describe functionality requirements • Architecture/implementation phases map requirements into sequence of operations • Mapping process loses information: • Boundaries of operations (What is A and what is B?) • Temporal dependences (Does B require A?) • Data dependences (Does B use data produced by A?) • Lost information introduces artificial dependences

  9. Designing for Robustness • Underlying assumption: All code contains bugs • Goal: Mitigate the consequences • Approach: • Decompose application into many small tasks • Specify dependences between these tasks • Data dependences • Control dependences • Use transactions to prevent failures from exposing partially updated data structures • Use dependence information to continue past failures

  10. Bristlecone Language • Program is specified as a set of tasks • Task specifications describe task dependences • Tasks have transactional semantics • Runtime system reasons about dependences to execute past failures (automated recovery)

  11. Web Server Example • Decoupled operations • Log Request and Send Page tasks are independent • Failure of one does not affect the other Accept Connection Read Request Log Request Send Page

  12. Specifying Object States • Different object states support different functionality (Type State) • Use flag construct to label conceptual object states • Use these flags to determine when to perform operations • Can differentiate between operations that have true data dependences and operations that just operate on same objects class WebRequest { Flag initialized; Flag send_page; Flag write_log; …}

  13. Tagging Objects • Motivation: Consider the web server example • Each connection has: • A Socket object that provides communication • A WebRequest object that stores application specific state • Need to pair the correct Socket and WebRequest objects together • Solution: Tag the group of objects Connection Tag Socket Object WebRequest Object

  14. Tagging Objects • Tags group object instances • Tags provide mechanism • Tags have types • Can create many instances of a tag type • Each instance defines a group • Can bind tag instances to objects • Tags can specify that task parameters must be in the same group

  15. Task Specifications • Describe data dependences of tasks • Describe affect of tasks on objects /* This task reads a request from a client. */ task readRequest(WebRequest w in initialized with connection t, Socket s in IO_Pending with connection t) { ... taskexit(w: initialized:=false, send_page:=true, write_log:=true); }

  16. Bristlecone Task Semantics • Runtime invokes tasks • Tasks can be invoked when objects are available in the heap that satisfy the task’s parameter guards • Task have transactional memory semantics • All operations are executed or none • Task execution appears to occur in a single instance • Failures cause transactions to abort and restore consistency

  17. Failure-free Execution Accept Connection Read Request Log Request Send Page

  18. Failure-free Execution Accept Connection Read Request Log Request Send Page

  19. Failure-free Execution Accept Connection Read Request Log Request Send Page

  20. Failure-free Execution Accept Connection Read Request Log Request Send Page

  21. Error Detection • Catching operating system signals • Arithmetic exceptions • Null pointer exceptions • Library signals • Socket errors • … • Runtime language checks • Array out of bounds exceptions • Assertions • Imperative consistency checks • Declarative data structure specifications

  22. Failure Recovery • Transactions restore data structures to previous consistent state • Problem: Re-executing the same task will likely result in the same failure • Solution: Use task specifications to determine what other tasks can be safely executed

  23. Automatic Recovery Accept Connection Read Request CRASH logRequest Send Page

  24. Automatic Recovery Accept Connection Read Request Log Request Send Page

  25. Automatic Recovery Summary Accept Connection Read Request Log Request Send Page

  26. Language Benefits • Use specifications to understand failure in a meaningful way • Use task specifications to reason how to recover from failures • Task specifications eliminate artificial dependences

  27. Task Dispatch • Goal: Determine which parameter objects satisfy task guards • Problem: Brute force search can be expensive • Our Approach maintains: • Parameter set of objects that satisfy an individual parameter’s guard • Active task queue of sets of parameter objects that collectively satisfy all of task’s guards

  28. Task Dispatch • Precisely maintain parameter sets • If an object is in a parameter set • It satisfy the flag component of the guard • Is bound to the correct types of tags • All objects that satisfy parameter’s guard are in parameter set • Active task queue is conservative • If a set of objects could potentially satisfy all of task’s guards, it is in the task queue • Must check that set of objects in a task queue invocation satisfies guards before invoking task

  29. Task Dispatch • When a new object is added to parameter set, create corresponding task queue invocations • Search for objects that satisfy tag guards • Idea: Use tags to prune search • When we add an object with a tag guard to the set, use tags to prune search of other parameter objects that must be bound to the same tag

  30. Task Binding Iteration • Structure computation as a list of iterators over tags and objects • Multiple types of iterators: • Over tags bound to object • Over objects bound to tag • Over objects in parameter set • Want to prune search early – ordering is important • Statically generate iterator orderings for each parameter set of each task

  31. Initial Experiences • Implemented Bristlecone compiler and runtime • Have evaluated system on several benchmarks including: • Web Server • Web Spider • Chat Server • Developed a Bristlecone and Java versions of each • Java versions were designed to use threads to provide resilience to failures • Randomly injected failures into executions

  32. Web Spider • Workload is a set of 100 web pages • Java version implemented using a thread pool architecture • 100 trials on each version • Randomly injected 3 halting failures into each execution • With injected failures • Java version fetched average of 6 pages • Bristlecone version fetched average of 91 pages

  33. Web Server • Web Server with support for e-commerce transactions • Java version spawns a thread for each connection • 200 trials on each version • Randomly injected 50 halting failures into each execution • With injected failures • Java failed to serve inventory requests in 4.5% of trials, Bristlecone failed in 1.5% • Java had correct inventory responses in 68.6%, Bristlecone in 100%

  34. Chat Server • Chat server allows multiple users to chat • Java version spawns a thread for each connection • 100 trials on each version • Workload sent 800 messages • Randomly injected 10 failures into each execution • With injected failures • Java version failed to serve 39.9% of messages • Bristlecone version failed to serve 19.3% of messages

  35. Related Work • Traditional fault tolerance • N-version programming • Recovery blocks • Exception handlers • Languages • Linda / Tuple spaces • Orc • Actors • Argus • Oz • Erlang • Software and Hardware Transactional Memory

  36. Conclusions • Bristlecone is a exciting approach to improve application reliability • Initial experiences promising

More Related