430 likes | 555 Views
Fault-Tolerance in the RGE Model. Andrew Grimshaw Anh Nguyen-Tuong University of Virginia. This work partially supported by DARPA(Navy) contract # N66001-96-C-8527, DOE grant DE-FD02-96ER25290, DOE contract Sandia LD-9391, Northrup-Grumman (for the DoD HPCMOD/PET program),
E N D
Fault-Tolerance in the RGE Model Andrew Grimshaw Anh Nguyen-Tuong University of Virginia This work partially supported by DARPA(Navy) contract # N66001-96-C-8527, DOE grant DE-FD02-96ER25290, DOE contract Sandia LD-9391, Northrup-Grumman (for the DoD HPCMOD/PET program), DOE D459000-16-3C and DARPA (GA) SC H607305A
Quick Overview • Background on Legion • RGE model • Exceptions & fault-tolerance examples • Summary
The Opportunity • Resources abound • people, computers • data, devices, ... • Complex environment • disjoint file systems • disjoint name spaces • security/fault-tolerance • …... • Primitive tools • Unrealized potential
Our Vision - One Transparent System • High-performance • Distributed • Secure • Fault-tolerant • Transparent
The Problem • To construct an extensible system that supports a wide variety of tools, languages and computation models (including parallel languages), and allows diverse security, fault-tolerance, replication, and resource management, and scheduling policies.
Extensible Site autonomy Secure Fault-tolerant High performance parallel processing Multi-language Scaleable Simple Single namespace Resource management Technical Requirements • We cannot replace host operating systems • We cannot legislate changes to the internet • We cannot require that Legion run as “root”
Legion philosophy • Provide flexible mechanism • Allow users/application designers to choose a point in the space kind of service cost level of service
Legion is object-based • Everything is an object • Legion’s core “system” objects • Hosts • Vaults • Binding Agents • All system objects can be replaced • Classes are objects Legion Objects
The object model • Legion objects • belong to classes • are logically independent, address-space-disjoint • communicate via non-blocking method calls • are Active or Inert • export object-mandatory member functions • are named using LOID’s Host 2 Class of B Host 1 Class of A Host 3 B.f() B A rval
Legion Object Identifiers (LOID) • LOID’s name objects and are location independent strings of bits • LOIDs have many fields (arbitrary) • Basic LOID’s have • type • “domain” • class id (integer) • instance id (integer) • public key (RSA)
ReflectiveGraphEventModel RGE Model
Support users flexibility & extensibility Target system developers familiar environment to end users (PVM, MPI, C++, Fortran) Transparency Separation of concerns application developer, fault-tolerance, security, resource management component-based Component reuse across environments: PVM, MPI, C++, Fortran over time: component repository Goals Leverage from existing technologies: Reflection and event-based architectures
Reflection • Introspection • expose self-representation of system to developers • Causal connection • modify self-representation to change behavior • Used in many domains • programming languages (open-C++, CLOS), databases (BeeHive), operating systems
Self-Representation • Events • specify component interaction inside objects • allow graphs as event handlers • remote method invocations • late binding of policies • Graphs • specify interactions between objects • method invocations, control flow & data dependencies • graph annotations for meta-level information
Macro-data flow A graph is a function First class graphs remote execution, passed as arguments, transformed @Main: x = A.foo(a); y = B.bar(b); z = C.add(x,y); print z Decentralized control reflective manipulating graph = manipulating behavior a b Graph A.foo B.bar C.add z Graphs
Annotations <tag, type, value> carries meta-level information Automatic propagation via call chain Part of method execution environment “root” = 1.07.01.effeabcde “certificate” = xyz-1998-12 Graph a b A.foo B.bar C.add z Implicit Parameters
Graph annotations • Function of • three arguments • two outputs (needs C90, 100 CPU minutes) (20 bytes) (100 Megabytes) Annotate the graph with resource information
Events • Decouples components spatially and temporally • flexibility & extensibility • Unified & familiar interface • not ad-hoc • easy to understand • Used for variety of purposes • Java Distributed Events, CORBA Event Notification Service, X-Windows, SPIN Kernel
Events • Events • typed, carry arbitrary data • Event Handlers • can preempt other handlers from being invoked • create and announce new events • can make remote method invocations • Event Manager • dispatches handlers (immediate or deferred)
Protocol Stack Method Invocation Method Invocation Security Security Method Assembly Method Assembly Network Network Sender Receiver
Events & Handlers Ready Invocations Method Ready Method Invocation Security Access Control Method Assembly Method Receive Network Method Assembly Receiver Message Receive
Method Invocation Method Invocation Security Security Method Assembly Method Assembly Method Assembly register handler Decrypt Method Assembly Network Network Message Receive Adding new component Security:Decrypt Receiver
Events Recap • Reflective • Introspective • structural implementation of objects • denote state transition inside of objects • Causally connected • adding/removing events or event handlers has direct effect on behavior • Now add graphs!
Method Assembly Message Logging Message Receive Graphs & Events • Use generic graph inside of generic components • generic remote method invocations • example: message logging “logger” = 1.08.01... B A Root D C Message Logger Geographical Display
Graphs as Event Handlers • Associate graphs with events dynamically • delayed policy, set at run-time • per method call using annotation • across methods • object designers need not anticipate all uses • Reflection • introspection: events “reflect” state transitions • causal connection: graphs represent actions
Notification Graph • Register Graph! • run-time binding • multiple graphs • multiple policies “Fire” Some_object. Some_method() Event Remote Notification • Graphs as event handlers What do do? Object A
Same object,multiple policies Event Monitor App X C Exception! Notify root Notify immediate caller Notify monitor App Y Special case: Exception Propagation Designer of Server does not have to worry about where or how to propagate exceptions! Server.work
C S.service C.return Graph EIS = <e , e , ..., e > 1 2 n Annotation Exception management • Exception <ExceptionType, exceptionData> • ExceptionType < majorTypeID, minorTypeID> • Exception Interest <exceptionType, Graph> • Exception Interest Set. An exception interest set contains a set of exception interests. • EIS carried in implicit parameters We’ve done three policies
B “Object B,C,D created” A D “Object E created” “Object A created” E C “Object F created” A F B, C, D E F App Monitoring &Shutdown Application Manager Object Set: A, B, C, D, E, F Now application manager can easily shutdown application Note generic nature of Application Manager
Various level of Failure Suspectors simple/complex pings hardware support heartbeat Network Weather Service (Wolski) Legion host/class detection Use events to notify & take actions register graph Notification Graph Some_object. Some_method() Failure Detector Failure Suspector X “Failure Suspected”
Stateless objects idempotent Proxy objects farms out requests keeps track outstanding requests reissue requests if failure or on timeout Possible configurations proxy stores requests on stable storage send requests to k/n workers Replication for Stateless Objects “MethodDone” S1.op S.op() Client Proxy S2.op S3.op
2 Phase Commit • Simple algorithm (Koo & Toueg ‘87) • Example usage • distributed snapshots • coordinated checkpointing & restart • transactions • RGE • provides building blocks for system developers • experts write 2PC component • handles coordination of objects
2 Phase Commit • Coordinator (C) • Ask participants to take • checkpoint • Wait for votes • If all say “Yes” decide YES otherwise NO • Commit decision • Tell participants of decision • Participants (Ps) • Vote “Yes” or “No” • If “Yes” take tentative checkpoint • Notify C • Await decision from C • If “Yes” commit checkpoint Phase 1 Phase 2
Graph Graph Coordinator. Vote_YES() Coordinator. Vote_NO() 2 Phase Commit Participant TakeTentativeCKPT() { if (decide “Y”) take tentative say “Y” else say “N” } Graph Coordinator. NotifyErr() “Error” “Yes” “No”
(1) Add methods to participants TakeTentativeChkpt, CommitChkpt, ... (2) Orphan Messages turn off communication at end of Phase 1 turn back on after Phase 2 (3) Missing Messages counting algorithm (4) Optimizations reduce # of objects taking checkpoint exploit object semantics RGE (1) Event handlers to register & export methods object-annotations (2 & 3) Hook up with message send and method send events (4) Use graph annotations dynamically available semantic information WORM, functional, idempotent (voting) Use event notification model for exceptions & voting 2 Phase Commit
2 Phase Commit • Rollback similar • System developer / Fault-Tolerance expert • implements 2PC algorithm • RGE provides building blocks • algorithm manipulates reflective data structures (graphs & events) • algorithm manipulates the user computation • reusable across object-based environments • Mentat, CORBA • User need not know about 2PC and details of implementation
Security Resource management Failure Detector Event notification service Active methods Bag of task scheduling ... PVM MPI Mentat Fortran Corba (in progress) Other applications m components n environments z applications
Related Work • Events • Java Distributed Event Specification • Corba’s Event Notification Service • Ensemble • SPIN Kernel • Graphs • LDF • HeNCE
Reflective model advantageous flexible & extensible at multiple levels functionality/policies expressed as components generic & reusable components written by experts leverage existing algorithms enables experimental research quick prototyping algorithms Achieves reuse across programming environments PVM, MPI, C++, Fortran Enables composition of functionality provided high level policies compatible Deployed today Conclusion
Three simple policies • Notify caller - “flow backward” • nested exceptions
“Flow-forward” • Rather than passing an error back, a “bottom” token is propagated forward