Osamah A. Rawashdeh and James E. Lumpp, Jr. Department of Electrical and Computer Engineering

Run-Time Behavior of Ardea: A Dynamically Reconfigurable Distributed Embedded Control Architecture Osamah A. Rawashdeh and James E. Lumpp, Jr. Department of Electrical and Computer Engineering University of Kentucky Lexington, KY

Outline • Motivation/Background • Objective and Contributions • Ardea Framework Overview • Ardea Hardware Architecture • Software Module Dependency Graphs • Ardea Fault Tolerance • Runtime Behavior • Summary and Conclusion

Embedded Control • Distributed embedded control system • Mission critical tasks • Non-mission critical tasks • Array of sensors and actuators • Set of computing resources • Interconnection network

Distributed State • Distributed Systems as HPC • Long Running Computations • Nodes and Links Fail • Rollback/Roll-forward • Save process state • Save in-transit messages • Limit interaction with the outside world • Redo the part that had the problem • Real-Time deadlines

Motivation • No building block can be error free so we have to tolerate faults in embedded/real-time HW/SW • Traditional Techniques: • TMR • N-version redundancy • (ad-hoc) failover approaches • Disadvantages: • Wasted resources (cost, size, weight, power) • Software complexity • Quantifying failures

17,000 ft 58,000 ft 63,000 ft 86,000 ft 89,000 ft Wing Deployment Aircraft Separation Vehicle Launch Parachute Landing BIG BLUE • BIG BLUE: Baseline Inflatable-wing Glider, Balloon- Launched Unmanned Experiment. • Ongoing project at UK to developing a test bed for Mars airplane technology. • ~ 40 undergraduate students involved per year.

UAV Research • BIG BLUE is funded by NASA Workforce Development Program. • Dependable UAVs for Homeland Security • BIG BLUE III, with inflatable only wings, and a UAV for entry in the AUVSI 3rd Annual Student UAV Competition. • “READY” UAV

BIG BLUE Fault Tolerance • BIG BLUE I • Single processor design • Static sensor redundancies • Ad hoc fault tolerance • BIG BLUE II • Distributed 3 processor design • I2C communication bus • Shared data memory for communication • Static redundancies • BIG BLUE III • Single processor design • Real-time multi-tasking OS used • Task interdependencies limited by using a “mailman” May 2003 May 2004 May 2005

Reconfiguration Based FT • Run-time reconfiguration FT feasible in distribute embedded systems: • Cost, size, power constraints • Availability of non-critical resources • Graceful degradation: a loss of or reduction in the quality of services a system provides in response to faults • Graceful degradation for distributed embedded systems is a new research area

The Challenge • How to specify a dynamically reconfiguring system that included static and dynamic redundancies • How to manage the redundancies • What infrastructure is needed to run these dynamic applications

Approach • Graceful Degradation • Hardware/Software faults degrade performance instead of causing system failure. • Resources dedicated to non-critical functions serve as backup resources for critical functions. • No need to consider every failure combination at design time. Objective: To develop a framework for specifying gracefully degrading distributed embedded systems.

The Ardea Framework • Ardea – Automatically Reconfigurable Distributed Embedded Architectures • Ardea herodias – The Great Blue Heron, a wading bird of the heron family Ardeidae, common all over North and Central America. This is the largest North American heron.

Ardea Overview • Software is developed in a modular fashion • Mobilesoftware modules can have several implementations with different resource requirements and output qualities • Dependencies among modules are graphically captured in software module dependency graphs(DGs) specifying application operating modes and execution parameters • A set of networked processors for running application software

Ardea Overview – cont. • A global system manager tracks status of hardware and software resources • System manager computes new system configurations (a mapping of software modules onto processing elements) • Local management tasks are responsible for OS scheduling and data routing • Target applications: real-time distributed embedded control/periodic applications

HW Architecture Overview • Processing Elements (PEs) - Homogeneous set of processors - Real-time OS. - Local management tasks (scheduler, network interface, loader) • I/O Devices - Sensors and actuators - Hosted by PEs • Communication Network - Broadcast Network - Bandwidth and Latency • System Manager - Fault tolerant by other means - Tracks status of resources - Finds and deploys configurations

Application Software Specification • Dependency graphs show the periodic flow of information from sensors to actuators (i.e., data pipelines) • Graph nodes: software modules, data variables, I/O devices, and dependency gates • Software modules: • Executable code schedulable on a processing element • Suspended while input(s) unavailable • Produce and consume data variables • Attributes: worst case execution time and rate factor

Data Exchange Data variables: • Application data between software modules • State data variables arelocal to a software module • Management data variables contain data consumed by system manager. • Attributes: • Size • Quality value or function Figure 5 - Page 19

Specifying Dependencies • Dependency gates: • “k-out-of-n OR” gates: n > 0, 0 ≤ k ≤ n • “AND”: all input required • “XOR”: only one input required • “DEMUX”: for fanning out • OR gates can be specified to distribute inputs

DG with Node Attributes ID = yaw_cntrl1 Exec_T = 900 cycl. Rate_factor = 1:5 ID = out1, out2 Criticality: critical Priority = 1, Rate = 10 Hz State = Enabled ID = rud_Angle1 Size = 2 bytes Quality = 1 ID = mag_drv1 , mag_drv2 Exec_T = 300 cycl. Rate_factor = n/a ID = yaw_history Size = 8 bytes Quality = n/a ID = servo1_drv, servo2_drv Exec_T = 200 cycles Rate_factor = 1:1 ID = yaw1, yaw2 Size = 2 bytes Quality = 1, 2 ID = yaw_cntrl2 Exec_T = 400 cycl. Rate_factor = 1:2 ID = rud_Angle2 Size = 2 bytes Quality = 2

Ardea Fault Detection & Handling • Failure detection of sensors, actuators and software modules is the responsibility of application software • Ardea built-in fault detection: • PE crash failures by heartbeat messages • Network link failures detected and handled as PE failures • Software module crashes detected locally by a module execution monitors • Critical output modules detect missed deadlines • Fault Handling: masking, reconfiguration, or fail-stop

Sensor Fault Detection

Actuator Fault Detection

Software Fault Detection

Triple Modular Redundancy

Ardea Runtime Behavior • Supporting mobile software modules (moving object code, scheduling/unscheduling, and data re-routing) • Tracking resource availability • Finding Configurations • Deploying Resources • Manage state data variables

The System Manager

Memory Loader: copies code into program memory Scheduler: starts and stops execution of modules Network Interface: handlespublic data variables (data routing) Processing Elements (PEs)

Mobility and Data Routing • Module I/O data passed through mailboxes • Data routing transparent to modules • Starting, stopping of modules Figure 26 - Page 61

Starting, stopping, and restarting modules Restarting requires: State Preservation Unprocessed data preservation Scheduling and Unscheduling

Two configuration finding algorithms: High-fidelity is (NP-hard) to find high-utility configurations Low-fidelity (fast) to insure running of critical services Response based on criticality of detected/reported fault Deploying configurations starting from sensor side of a DG Reconfiguration Policies

Reconfiguration Policies

Ardea Benefits • More flexible fault tolerance at reduced cost • Ability to analyze reconfigurable architectures using DGs • Simplified debugging and maintenance • Runtime system testing • Graceful upgrade and repair • Reduction of design errors • Software reusability

Current Work • Applying techniques to a UAV for AUVSI student UAV competition. • Avionics system for BIG BLUE IV. • “READY” UAV Project • Expand bus via wireless link to the ground: • Rapid prototyping • Minimize risk to hardware • Flexible Reconfiguration

Conclusion • Graceful degradation in distributed embedded system is a new research area currently focusing on either abstract modeling or on non-real-time/non-critical systems • Ardea provides a structured framework for the design and implementation of real-time systems • Dependency graphs were presented to capture fault tolerant, dynamically reconfiguring, software architectures • An infrastructure supporting reconfigurable distributed reconfigurable applications was presented

Osamah A. Rawashdeh and James E. Lumpp, Jr. Department of Electrical and Computer Engineering

Osamah A. Rawashdeh and James E. Lumpp, Jr. Department of Electrical and Computer Engineering

Presentation Transcript

Department of Electrical and Computer Engineering

THE DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

THE DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

THE DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Department of Electrical and Computer Engineering

Department of Electrical and Computer Engineering

Department of Electrical and Computer Engineering

Department of Electrical and Computer Engineering

THE DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

THE DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Department of electrical and computer engineering

Department of Electrical and Computer Engineering

Department of Computer and Electrical Engineering

Department of Computer Science and Electrical Engineering

Electrical and Computer Engineering Department

THE DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

The Department of Electrical and Computer Engineering

Department of Computer and Electrical Engineering

THE DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Osamah A. Rawashdeh and James E. Lumpp, Jr. Department of Electrical and Computer Engineering