260 likes | 524 Views
BEACON. Edited : Dec 11 th. Summary. Principle Scenarios Existing Technologies. Logical Representation of BEACON a nd Beacon end-points (Beeps). Notifications. Notifications. Notifications. Notifications. Notifications. Notifications. Enclave . Enclave . Commands. Commands.
E N D
BEACON Edited: Dec11th
Summary • Principle • Scenarios • Existing Technologies
Logical Representation of BEACONand Beacon end-points (Beeps) Notifications Notifications Notifications Notifications Notifications Notifications Enclave Enclave Commands Commands Commands Commands Commands Commands Application Application Runtime systems Runtime systems Node Node Node Node CPU CPU CPU CPU CPU CPU CPU CPU OS OS OS OS BEACON and Exposé backplanes Job and Resource manager RAS system
Logical Representation of BEACON System Enclave Application Application Notifications Notifications Notifications Notifications Notifications Notifications Commands Commands Commands Commands Commands Commands Runtime systems Runtime systems Node Node Node Node CPU CPU CPU CPU CPU CPU CPU CPU Local BEACON Local BEACON Local BEACON Local BEACON Global BEACON Global BEACON Global BEACON Global BEACON Job and Resource manager RAS channel
BEACON Principle Enclave Enclave • Two daemons helps failure containment, fault isolation, and security • Global Beacon is created when node boots up; connects to other global beacons on other active nodes during startup • Local beacon is launched with the job in an enclave; connects to the global beacon on the same node Node Node Node Node Local BEACON Local BEACON Local BEACON Local BEACON Global BEACON Global BEACON Global BEACON Global BEACON
BEACON Services OS, Runtime, Applications, RMS, RAS, Enclave services, EXPOSE Beacon Related Services • Logger • Translators BEACON API Response management Query management? BEACON Transport Reliable channel Unreliable channel IP multicast TCP/IP PAMI(BG/Q), IBM machine? uGNI (XK6), Aries (XC30)?
BEACON Events • Beacon will support two types of data • Internal events (subscriptions, Beacon maintenance, announcements, etc.) • External events (notifications, commands) • Internal events can be produced and consumed by Beacon and its services • External events are produced and consumed by all Beeps • ? Do we need discrete and stream events? Stream throttling? Scenarios?
BEACON Event Format Priority: -reliable or not -discrete or stream (if needed) Payload: -generated and interpreted by Beeps
BEACON Start-up • Discovery and Topology • Discovery and Topology daemon will reside on a permanent node (similar to service node in BG) • Will help establish the topology of global Beacon daemons; global daemons will contact it for parent discovery • Scalable, resilient (replication) • Topology options are still being researched: • Small degree • Small diameter • High resilience • Multiple paths • CHORD,and other P2P topologies are candidates
BEACON Transport • BEACON transport can deliver events reliably or unreliably • Unreliable delivery: no delivery guarantees. • Reliable Delivery : Reliability will need to be end-to-end across a distributed chain of agents (higher protocol that TCP) • Event Buffering • Required because Time-To-Live for every event message • TTL is set by publisher (from 0 for immediate to few minutes?) • Producer produces events; but subscriber disappears before event reaches it Event is dropped after TTL • Producer produces events; but subscription has not yet propagated in the system Event will be sent to the subscriber (by the logger) if TTL is valid
BEACON Services • Use the Beacon API (no other Point to point messaging) • Translators– Translate events so that they can be understood semantically between Beeps • Response Management– Manages responses and coordinates different entities following recovery plans • Logger – Logs external events and re-publishes events, based on un-expired TTL, for new (or restarting) subscribers, duplicate events (re-published by the logger) will not be re-delivered to subscribers • ? Query Management - Manages queries within the BEACON framework ?
Translators • The translators do not perform actions – they just read an event and publish a new event, using state information to translate the payload • Subscribers would have to subscribe to events coming from the translators • For any system that does a mapping and/or allocation, we need a translator that can reverse the mapping. • For ARGO, we will build a specific translator only when there is no other software in the process stack performing that translation (e.g. If MPI can tell that rank Y is failing when 0x1234 fails, then we do not need a translator for that)
Example scenario • Fan has failed This will cause several nodes and switches to fail within 5 seconds. The failure will affect several jobs and will affect the network. Some of the jobs can take preventive measures to handle node failures, other cannot. • Fan controller issues event “fan 17245 failed at 00:00:00” • “Translator process” A subscribes to “fan failures in the system” and picks this message and issue several messages of the form “node 175 will fail at 00:00:05” • “Translator process” B subscribes to “node failures in the system” and picks this message and issues the message “node 73 of enclave foo will fail at 00:00:05” • The enclave manager C subscribes to “node failures in enclave foo” and picks this message and issues messages of the form “process with rank 25 in M : PI_COMM_WORLD” will fail at 00:00:05
Example scenario Ideally speaking, • Translator A - uses information on the physical system topology; it could also use information on the current system health: • Translator B - uses information on the nodes allocated to each enclave (by the global resource manager) • Translator C-uses information on the mapping of MPI processes to the nodes (by the partition manager) Practically speaking, • Creation of translators might be scenario based
Beacon Scenarios Double bit error: detected/uncorrectable Application and library both can handle, Response manager decide which one does the correction Example of application: Bag of tasks, each tasks calling linear algebra functions or FFTs (ABFT version)
Double bit error: detected/uncorrectableIn App: App handles “Classic” way Mem access or Scrubbing App Progress is stopped Beacon way OS returns control to App Handler fix or not Lib Manager decides App should fix Register @Handler Invocation of signal handler Response Manager OS Handle returns to OS Hardware interrupt Beacon Mem Cont Mem access App Progress is stopped OS returns control to App Handler Fix or not Lib App level Handler Lib level Handler OS uses API to ask response Invocation of signal handler OS OS needs to accept multiple handlers App handler returns to OS Hardware interrupt Mem Cont
Double bit error: detected/uncorrectableIn Lib: Lib handles Beacon way Manager decides Lib should fix Response Manager Beacon App Progress is stopped OS returns control to Lib Mem access Handler Fix or not Lib App level Handler Lib level Handler OS uses API to ask response Invocation of signal handler OS Lib handler returns to OS Hardware interrupt Mem Cont
Double bit error: detected/uncorrectableIn Lib: App handles Beacon way Manager decides App should fix Response Manager Beacon App Progress is stopped OS returns control to App Handler Fix or not Mem access Lib App level Handler Lib level Handler OS uses API to ask response Invocation of signal handler OS App handler returns to OS Hardware interrupt Mem Cont Note that the correction may be attempted in the Lib first and if the Lib does not succeed then the application handler could be called. The corresponding diagram could be built from this one and the previous one.
Response Management (RM) • Entities who subscribe and receive events will want to respond with actions • A response management framework will need to manage response/recovery authorizations in systematic manner without compromising system stability • Phases of the BEACON software: Each BEACON-enabled software will have the following phases: • Announcement of capabilities : Entities have to announce their response capabilities for various events. Responses are declared on a per-event basis by every component • Exchange of events :- Publish and subscribe to event; receive events • Responding to events :- RM will implement a response plan,decide who should take action and will publish corresponding events. Response/recovery sequence is listed in an admin-provided data file
Response Management Response manager • Tracks when component connects and exit • One exists per enclave. We might add a global response manager, if needed • Will subscribe to events of topic = “auth-requested” • Will publish events of topic = “auth-response” will indicate if a software has permission to start recovery • “auth-response” events are also called as commands
Response Manager Protocolin case of multiple recovery options 1. Received event foo Fault-Tolerant Application Response Manager 2. Publishes “Want Auth for foo” BEACON 4. Publish “Recovery Started” 5. Publish “Recovery Failed ” 3. Publishes “Auth granted” to (1) App; (2) MM Response plan: Try app first Then migration 1. Received event foo Migration Manager (MM) 2. Publishes “Want Auth for foo” 6. Publish “Recovery Started” 7. Publish “Recovery Completed”
Query management • Currently, no scenarios seem to require this feature • Wait and see approach; reliable BEACON anyways provides a foundation to build this
Existing Technologies • Characterization of the system architecture to be used in the ARGO project • Looked at existing technologies (Astrolabe, Google Dapper, IBM Elastic subscribe) • Nothing that can be picked up and used since most are designed for the internet. Use gossip protocols; do not offer reliable delivery • Other potential technologies under investigation • CIFTS, AMQP, EVPATH
Existing Technologies • Characterization of the system architecture to be used in the ARGO project • Looked at existing technologies (Astrolabe, Google Dapper, IBM Elastic subscribe) • Nothing that can be picked up and used since most are designed for the internet. Use gossip protocols; do not offer reliable delivery • Other potential technologies under investigation • CIFTS, AMQP, EVPATH