1 / 34

Osamah A. Rawashdeh and James E. Lumpp, Jr. Department of Electrical and Computer Engineering

Explore the dynamic behavior of Ardea, a cutting-edge distributed embedded system architecture, focusing on fault tolerance, real-time operations, and graceful degradation techniques. Discover the innovative approach shaping the future of distributed control systems.

spearsc
Download Presentation

Osamah A. Rawashdeh and James E. Lumpp, Jr. Department of Electrical and Computer Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Run-Time Behavior of Ardea: A Dynamically Reconfigurable Distributed Embedded Control Architecture Osamah A. Rawashdeh and James E. Lumpp, Jr. Department of Electrical and Computer Engineering University of Kentucky Lexington, KY

  2. Outline • Motivation/Background • Objective and Contributions • Ardea Framework Overview • Ardea Hardware Architecture • Software Module Dependency Graphs • Ardea Fault Tolerance • Runtime Behavior • Summary and Conclusion

  3. Embedded Control • Distributed embedded control system • Mission critical tasks • Non-mission critical tasks • Array of sensors and actuators • Set of computing resources • Interconnection network

  4. Distributed State • Distributed Systems as HPC • Long Running Computations • Nodes and Links Fail • Rollback/Roll-forward • Save process state • Save in-transit messages • Limit interaction with the outside world • Redo the part that had the problem • Real-Time deadlines

  5. Motivation • No building block can be error free so we have to tolerate faults in embedded/real-time HW/SW • Traditional Techniques: • TMR • N-version redundancy • (ad-hoc) failover approaches • Disadvantages: • Wasted resources (cost, size, weight, power) • Software complexity • Quantifying failures

  6. 17,000 ft 58,000 ft 63,000 ft 86,000 ft 89,000 ft Wing Deployment Aircraft Separation Vehicle Launch Parachute Landing BIG BLUE • BIG BLUE: Baseline Inflatable-wing Glider, Balloon- Launched Unmanned Experiment. • Ongoing project at UK to developing a test bed for Mars airplane technology. • ~ 40 undergraduate students involved per year.

  7. UAV Research • BIG BLUE is funded by NASA Workforce Development Program. • Dependable UAVs for Homeland Security • BIG BLUE III, with inflatable only wings, and a UAV for entry in the AUVSI 3rd Annual Student UAV Competition. • “READY” UAV

  8. BIG BLUE Fault Tolerance • BIG BLUE I • Single processor design • Static sensor redundancies • Ad hoc fault tolerance • BIG BLUE II • Distributed 3 processor design • I2C communication bus • Shared data memory for communication • Static redundancies • BIG BLUE III • Single processor design • Real-time multi-tasking OS used • Task interdependencies limited by using a “mailman” May 2003 May 2004 May 2005

  9. Reconfiguration Based FT • Run-time reconfiguration FT feasible in distribute embedded systems: • Cost, size, power constraints • Availability of non-critical resources • Graceful degradation: a loss of or reduction in the quality of services a system provides in response to faults • Graceful degradation for distributed embedded systems is a new research area

  10. The Challenge • How to specify a dynamically reconfiguring system that included static and dynamic redundancies • How to manage the redundancies • What infrastructure is needed to run these dynamic applications

  11. Approach • Graceful Degradation • Hardware/Software faults degrade performance instead of causing system failure. • Resources dedicated to non-critical functions serve as backup resources for critical functions. • No need to consider every failure combination at design time. Objective: To develop a framework for specifying gracefully degrading distributed embedded systems.

  12. The Ardea Framework • Ardea – Automatically Reconfigurable Distributed Embedded Architectures • Ardea herodias – The Great Blue Heron, a wading bird of the heron family Ardeidae, common all over North and Central America. This is the largest North American heron.

  13. Ardea Overview • Software is developed in a modular fashion • Mobilesoftware modules can have several implementations with different resource requirements and output qualities • Dependencies among modules are graphically captured in software module dependency graphs(DGs) specifying application operating modes and execution parameters • A set of networked processors for running application software

  14. Ardea Overview – cont. • A global system manager tracks status of hardware and software resources • System manager computes new system configurations (a mapping of software modules onto processing elements) • Local management tasks are responsible for OS scheduling and data routing • Target applications: real-time distributed embedded control/periodic applications

  15. HW Architecture Overview • Processing Elements (PEs) - Homogeneous set of processors - Real-time OS. - Local management tasks (scheduler, network interface, loader) • I/O Devices - Sensors and actuators - Hosted by PEs • Communication Network - Broadcast Network - Bandwidth and Latency • System Manager - Fault tolerant by other means - Tracks status of resources - Finds and deploys configurations

  16. Application Software Specification • Dependency graphs show the periodic flow of information from sensors to actuators (i.e., data pipelines) • Graph nodes: software modules, data variables, I/O devices, and dependency gates • Software modules: • Executable code schedulable on a processing element • Suspended while input(s) unavailable • Produce and consume data variables • Attributes: worst case execution time and rate factor

  17. Data Exchange Data variables: • Application data between software modules • State data variables arelocal to a software module • Management data variables contain data consumed by system manager. • Attributes: • Size • Quality value or function Figure 5 - Page 19

  18. Specifying Dependencies • Dependency gates: • “k-out-of-n OR” gates: n > 0, 0 ≤ k ≤ n • “AND”: all input required • “XOR”: only one input required • “DEMUX”: for fanning out • OR gates can be specified to distribute inputs

  19. DG with Node Attributes ID = yaw_cntrl1 Exec_T = 900 cycl. Rate_factor = 1:5 ID = out1, out2 Criticality: critical Priority = 1, Rate = 10 Hz State = Enabled ID = rud_Angle1 Size = 2 bytes Quality = 1 ID = mag_drv1 , mag_drv2 Exec_T = 300 cycl. Rate_factor = n/a ID = yaw_history Size = 8 bytes Quality = n/a ID = servo1_drv, servo2_drv Exec_T = 200 cycles Rate_factor = 1:1 ID = yaw1, yaw2 Size = 2 bytes Quality = 1, 2 ID = yaw_cntrl2 Exec_T = 400 cycl. Rate_factor = 1:2 ID = rud_Angle2 Size = 2 bytes Quality = 2

  20. Ardea Fault Detection & Handling • Failure detection of sensors, actuators and software modules is the responsibility of application software • Ardea built-in fault detection: • PE crash failures by heartbeat messages • Network link failures detected and handled as PE failures • Software module crashes detected locally by a module execution monitors • Critical output modules detect missed deadlines • Fault Handling: masking, reconfiguration, or fail-stop

  21. Sensor Fault Detection

  22. Actuator Fault Detection

  23. Software Fault Detection

  24. Triple Modular Redundancy

  25. Ardea Runtime Behavior • Supporting mobile software modules (moving object code, scheduling/unscheduling, and data re-routing) • Tracking resource availability • Finding Configurations • Deploying Resources • Manage state data variables

  26. The System Manager

  27. Memory Loader: copies code into program memory Scheduler: starts and stops execution of modules Network Interface: handlespublic data variables (data routing) Processing Elements (PEs)

  28. Mobility and Data Routing • Module I/O data passed through mailboxes • Data routing transparent to modules • Starting, stopping of modules Figure 26 - Page 61

  29. Starting, stopping, and restarting modules Restarting requires: State Preservation Unprocessed data preservation Scheduling and Unscheduling

  30. Two configuration finding algorithms: High-fidelity is (NP-hard) to find high-utility configurations Low-fidelity (fast) to insure running of critical services Response based on criticality of detected/reported fault Deploying configurations starting from sensor side of a DG Reconfiguration Policies

  31. Reconfiguration Policies

  32. Ardea Benefits • More flexible fault tolerance at reduced cost • Ability to analyze reconfigurable architectures using DGs • Simplified debugging and maintenance • Runtime system testing • Graceful upgrade and repair • Reduction of design errors • Software reusability

  33. Current Work • Applying techniques to a UAV for AUVSI student UAV competition. • Avionics system for BIG BLUE IV. • “READY” UAV Project • Expand bus via wireless link to the ground: • Rapid prototyping • Minimize risk to hardware • Flexible Reconfiguration

  34. Conclusion • Graceful degradation in distributed embedded system is a new research area currently focusing on either abstract modeling or on non-real-time/non-critical systems • Ardea provides a structured framework for the design and implementation of real-time systems • Dependency graphs were presented to capture fault tolerant, dynamically reconfiguring, software architectures • An infrastructure supporting reconfigurable distributed reconfigurable applications was presented

More Related