340 likes | 352 Views
Explore the dynamic behavior of Ardea, a cutting-edge distributed embedded system architecture, focusing on fault tolerance, real-time operations, and graceful degradation techniques. Discover the innovative approach shaping the future of distributed control systems.
E N D
Run-Time Behavior of Ardea: A Dynamically Reconfigurable Distributed Embedded Control Architecture Osamah A. Rawashdeh and James E. Lumpp, Jr. Department of Electrical and Computer Engineering University of Kentucky Lexington, KY
Outline • Motivation/Background • Objective and Contributions • Ardea Framework Overview • Ardea Hardware Architecture • Software Module Dependency Graphs • Ardea Fault Tolerance • Runtime Behavior • Summary and Conclusion
Embedded Control • Distributed embedded control system • Mission critical tasks • Non-mission critical tasks • Array of sensors and actuators • Set of computing resources • Interconnection network
Distributed State • Distributed Systems as HPC • Long Running Computations • Nodes and Links Fail • Rollback/Roll-forward • Save process state • Save in-transit messages • Limit interaction with the outside world • Redo the part that had the problem • Real-Time deadlines
Motivation • No building block can be error free so we have to tolerate faults in embedded/real-time HW/SW • Traditional Techniques: • TMR • N-version redundancy • (ad-hoc) failover approaches • Disadvantages: • Wasted resources (cost, size, weight, power) • Software complexity • Quantifying failures
17,000 ft 58,000 ft 63,000 ft 86,000 ft 89,000 ft Wing Deployment Aircraft Separation Vehicle Launch Parachute Landing BIG BLUE • BIG BLUE: Baseline Inflatable-wing Glider, Balloon- Launched Unmanned Experiment. • Ongoing project at UK to developing a test bed for Mars airplane technology. • ~ 40 undergraduate students involved per year.
UAV Research • BIG BLUE is funded by NASA Workforce Development Program. • Dependable UAVs for Homeland Security • BIG BLUE III, with inflatable only wings, and a UAV for entry in the AUVSI 3rd Annual Student UAV Competition. • “READY” UAV
BIG BLUE Fault Tolerance • BIG BLUE I • Single processor design • Static sensor redundancies • Ad hoc fault tolerance • BIG BLUE II • Distributed 3 processor design • I2C communication bus • Shared data memory for communication • Static redundancies • BIG BLUE III • Single processor design • Real-time multi-tasking OS used • Task interdependencies limited by using a “mailman” May 2003 May 2004 May 2005
Reconfiguration Based FT • Run-time reconfiguration FT feasible in distribute embedded systems: • Cost, size, power constraints • Availability of non-critical resources • Graceful degradation: a loss of or reduction in the quality of services a system provides in response to faults • Graceful degradation for distributed embedded systems is a new research area
The Challenge • How to specify a dynamically reconfiguring system that included static and dynamic redundancies • How to manage the redundancies • What infrastructure is needed to run these dynamic applications
Approach • Graceful Degradation • Hardware/Software faults degrade performance instead of causing system failure. • Resources dedicated to non-critical functions serve as backup resources for critical functions. • No need to consider every failure combination at design time. Objective: To develop a framework for specifying gracefully degrading distributed embedded systems.
The Ardea Framework • Ardea – Automatically Reconfigurable Distributed Embedded Architectures • Ardea herodias – The Great Blue Heron, a wading bird of the heron family Ardeidae, common all over North and Central America. This is the largest North American heron.
Ardea Overview • Software is developed in a modular fashion • Mobilesoftware modules can have several implementations with different resource requirements and output qualities • Dependencies among modules are graphically captured in software module dependency graphs(DGs) specifying application operating modes and execution parameters • A set of networked processors for running application software
Ardea Overview – cont. • A global system manager tracks status of hardware and software resources • System manager computes new system configurations (a mapping of software modules onto processing elements) • Local management tasks are responsible for OS scheduling and data routing • Target applications: real-time distributed embedded control/periodic applications
HW Architecture Overview • Processing Elements (PEs) - Homogeneous set of processors - Real-time OS. - Local management tasks (scheduler, network interface, loader) • I/O Devices - Sensors and actuators - Hosted by PEs • Communication Network - Broadcast Network - Bandwidth and Latency • System Manager - Fault tolerant by other means - Tracks status of resources - Finds and deploys configurations
Application Software Specification • Dependency graphs show the periodic flow of information from sensors to actuators (i.e., data pipelines) • Graph nodes: software modules, data variables, I/O devices, and dependency gates • Software modules: • Executable code schedulable on a processing element • Suspended while input(s) unavailable • Produce and consume data variables • Attributes: worst case execution time and rate factor
Data Exchange Data variables: • Application data between software modules • State data variables arelocal to a software module • Management data variables contain data consumed by system manager. • Attributes: • Size • Quality value or function Figure 5 - Page 19
Specifying Dependencies • Dependency gates: • “k-out-of-n OR” gates: n > 0, 0 ≤ k ≤ n • “AND”: all input required • “XOR”: only one input required • “DEMUX”: for fanning out • OR gates can be specified to distribute inputs
DG with Node Attributes ID = yaw_cntrl1 Exec_T = 900 cycl. Rate_factor = 1:5 ID = out1, out2 Criticality: critical Priority = 1, Rate = 10 Hz State = Enabled ID = rud_Angle1 Size = 2 bytes Quality = 1 ID = mag_drv1 , mag_drv2 Exec_T = 300 cycl. Rate_factor = n/a ID = yaw_history Size = 8 bytes Quality = n/a ID = servo1_drv, servo2_drv Exec_T = 200 cycles Rate_factor = 1:1 ID = yaw1, yaw2 Size = 2 bytes Quality = 1, 2 ID = yaw_cntrl2 Exec_T = 400 cycl. Rate_factor = 1:2 ID = rud_Angle2 Size = 2 bytes Quality = 2
Ardea Fault Detection & Handling • Failure detection of sensors, actuators and software modules is the responsibility of application software • Ardea built-in fault detection: • PE crash failures by heartbeat messages • Network link failures detected and handled as PE failures • Software module crashes detected locally by a module execution monitors • Critical output modules detect missed deadlines • Fault Handling: masking, reconfiguration, or fail-stop
Ardea Runtime Behavior • Supporting mobile software modules (moving object code, scheduling/unscheduling, and data re-routing) • Tracking resource availability • Finding Configurations • Deploying Resources • Manage state data variables
Memory Loader: copies code into program memory Scheduler: starts and stops execution of modules Network Interface: handlespublic data variables (data routing) Processing Elements (PEs)
Mobility and Data Routing • Module I/O data passed through mailboxes • Data routing transparent to modules • Starting, stopping of modules Figure 26 - Page 61
Starting, stopping, and restarting modules Restarting requires: State Preservation Unprocessed data preservation Scheduling and Unscheduling
Two configuration finding algorithms: High-fidelity is (NP-hard) to find high-utility configurations Low-fidelity (fast) to insure running of critical services Response based on criticality of detected/reported fault Deploying configurations starting from sensor side of a DG Reconfiguration Policies
Ardea Benefits • More flexible fault tolerance at reduced cost • Ability to analyze reconfigurable architectures using DGs • Simplified debugging and maintenance • Runtime system testing • Graceful upgrade and repair • Reduction of design errors • Software reusability
Current Work • Applying techniques to a UAV for AUVSI student UAV competition. • Avionics system for BIG BLUE IV. • “READY” UAV Project • Expand bus via wireless link to the ground: • Rapid prototyping • Minimize risk to hardware • Flexible Reconfiguration
Conclusion • Graceful degradation in distributed embedded system is a new research area currently focusing on either abstract modeling or on non-real-time/non-critical systems • Ardea provides a structured framework for the design and implementation of real-time systems • Dependency graphs were presented to capture fault tolerant, dynamically reconfiguring, software architectures • An infrastructure supporting reconfigurable distributed reconfigurable applications was presented