410 likes | 433 Views
Explore the run-time behavior of the Ardea framework, a distributed embedded control system emphasizing fault tolerance and real-time deadlines. Learn about the Ardea hardware architecture, software module dependency graphs, fault tolerance mechanisms, and more.
E N D
Run-Time Behavior of Ardea: A Dynamically Reconfigurable Distributed Embedded Control Architecture Osamah A. Rawashdeh and James E. Lumpp, Jr. Department of Electrical and Computer Engineering University of Kentucky Lexington, KY
Outline • Motivation/Background • Objective and Contributions • Ardea Framework Overview • Ardea Hardware Architecture • Software Module Dependency Graphs • Ardea Fault Tolerance • Runtime Behavior • Summary and Conclusion
Embedded Control • Distributed embedded control system • Mission critical tasks • Non-mission critical tasks • Array of sensors and actuators • Set of computing resources • Interconnection network
Distributed State • Distributed Systems as HPC • Long Running Computations • Nodes and Links Fail • Rollback/Roll-forward • Save process state • Save in-transit messages • Limit interaction with the outside world • Redo the part that had the problem • Real-Time deadlines
Motivation • No building block can be error free so we have to tolerate faults in embedded/real-time HW/SW • Traditional Techniques: • TMR • N-version redundancy • (ad-hoc) failover approaches • Disadvantages: • Wasted resources (cost, size, weight, power) • Software complexity • Quantifying failures
17,000 ft 58,000 ft 63,000 ft 86,000 ft 89,000 ft Wing Deployment Aircraft Separation Vehicle Launch Parachute Landing BIG BLUE • BIG BLUE: Baseline Inflatable-wing Glider, Balloon- Launched Unmanned Experiment. • Ongoing project at UK to developing a test bed for Mars airplane technology. • ~ 40 undergraduate students involved per year.
UAV Research • BIG BLUE is funded by NASA Workforce Development Program. • Dependable UAVs for Homeland Security • BIG BLUE III, with inflatable only wings, and a UAV for entry in the AUVSI 3rd Annual Student UAV Competition. • “READY” UAV
BIG BLUE II Architecture • Mission Controller • Auto-Sequencing • Data Acquisition • Ground Communication • Flight Controller • Control Glider • Chute Control • Monitor System Status • Deploy Recovery Chute • Camera Driver • Capture Images • Store to NVRAM • Shared Memory Space • Mailbox-Based Messaging
Dependable Systems • Dependability: trustworthiness of a system allowing reliance to be justifiably placed on it’s services • Failures: Deviation of service provided from compliance with specifications • Faults: the cause of failures • Failure semantics: omission, timing, response, and crash • Hardware versus software faults • Fault Tolerance: ability to continue operation despite failures Figure 1 - Page 6
Traditional Fault Tolerance • Fault tolerance entails fault detection and subsequent handling • Fault tolerance requires redundancy: • Static redundancy (spatial redundancy) • Modular redundancy • Design Diversity • Dynamic redundancy (temporal redundancy) • Recovery blocks • Failover programming
Reconfiguration Based FT • Run-time reconfiguration FT feasible in distribute embedded systems: • Cost, size, power constraints • Availability of non-critical resources • Graceful degradation: a loss of or reduction in the quality of services a system provides in response to faults • Graceful degradation for distributed embedded systems is a new research area
Approach • Graceful Degradation • Hardware/Software faults degrade performance instead of causing system failure. • Resources dedicated to non-critical functions serve as backup resources for critical functions. • No need to consider every failure combination at design time. Objective: To develop a framework for specifying gracefully degrading distributed embedded systems.
The Challenges • How to specify a dynamically reconfiguring system that included static and dynamic redundancies as well as graceful degradation abilities • How to manage the redundancies • What infrastructure is needed to run these dynamic applications
The Challenges • Software mode location independence • Moving object code • Routing module I/O data • Fault Recognition • User/application code reported faults • Ardea built-in fault detection • Tracking status/availability of HW and SW resources • Configuration Management • Tracking resource availability (HW and SW) • Finding new configurations (mapping of modules to PEs) • Deploying new configurations (starting, stopping, and restarting modules) • Managing state data variables • Reconfiguration time and critical deadlines. (multiple system reconfiguration policies to support reconfiguration before deadlines are missed. If a deadline is missed, then system fail-stop)
The Ardea Framework • Ardea – Automatically Reconfigurable Distributed Embedded Architectures • Ardea herodias – The Great Blue Heron, a wading bird of the heron family Ardeidae, common all over North and Central America. This is the largest North American heron.
HW Architecture Overview • Processing Elements (PEs) - Homogeneous set of processors - Real-time OS. - Local management tasks (scheduler, network interface, loader) • I/O Devices - Sensors and actuators - Hosted by PEs • Communication Network - Broadcast Network - Bandwidth and Latency • System Manager - Fault tolerant by other means - Tracks status of resources - Finds and deploys configurations
Micro C/OS-II • Portable, ROMable, scalable, preemptive, real-time, multi-tasking, priority-based kernel. • Source available, ANSI C and free for academic use. • Ported to 40+ architectures (8 to 64 bit) since 1992. • Meets RTCA DO-178B Level 1 • Uses 4% CPU and 3 KB - 30 KB RAM
CAN Aerospace • Stock Flight Systems • NASA Langley AGATE/SATS • NASA Ames SOFIA
Silicon Labs C8051F040X • 25 MIPS pipelined 8051 Integrated CAN 2.0 B controller • 64 kB of Flash, 4 kB of SRAM, external memory interface • Mixed Signal • Dual UARTs, SMBus and SPI serial interfaces • MicroC/OS and Ardea CAN layer were ported to this MCU
Ardea Overview • Software is developed in a modular fashion • Mobilesoftware modules can have several implementations with different resource requirements and output qualities • Dependencies among modules are graphically captured in software module dependency graphs(DGs) specifying application operating modes and execution parameters • A set of networked processors for running application software
Ardea Overview – cont. • A global system manager tracks status of hardware and software resources • System manager computes new system configurations (a mapping of software modules onto processing elements) • Local management tasks are responsible for OS scheduling and data routing • Target applications: real-time distributed embedded control/periodic applications
Application Software Specification • Dependency graphs show the periodic flow of information from sensors to actuators (i.e., data pipelines) • Graph nodes: software modules, data variables, I/O devices, and dependency gates • Software modules: • Executable code schedulable on a processing element • Suspended while input(s) unavailable • Produce and consume data variables • Attributes: worst case execution time and rate factor
Data Exchange Data variables: • Application data between software modules • State data variables arelocal to a software module • Management data variables contain data consumed by system manager. • Attributes: • Size • Quality value or function Figure 5 - Page 19
Specifying Dependencies • Dependency gates: • “k-out-of-n OR” gates: n > 0, 0 ≤ k ≤ n • “AND”: all input required • “XOR”: only one input required • “DEMUX”: for fanning out • OR gates can be specified to distribute inputs
DG with Node Attributes ID = yaw_cntrl1 Exec_T = 900 cycl. Rate_factor = 1:5 ID = out1, out2 Criticality: critical Priority = 1, Rate = 10 Hz State = Enabled ID = rud_Angle1 Size = 2 bytes Quality = 1 ID = mag_drv1 , mag_drv2 Exec_T = 300 cycl. Rate_factor = n/a ID = yaw_history Size = 8 bytes Quality = n/a ID = servo1_drv, servo2_drv Exec_T = 200 cycles Rate_factor = 1:1 ID = yaw1, yaw2 Size = 2 bytes Quality = 1, 2 ID = yaw_cntrl2 Exec_T = 400 cycl. Rate_factor = 1:2 ID = rud_Angle2 Size = 2 bytes Quality = 2
Ardea Fault Tolerance • Specifying static redundancy • Modular redundancy (E.g.,TMR) • N-version programming • Specifying dynamic fault redundancy • Rollback • Roll forward • Check pointing • Specifying graceful degradation • Multi-version software modules • Shedding non-critical services • Reducing update/output rate of services
Ardea Fault Detection & Handling • Failure detection of sensors, actuators and software modules is the responsibility of application software • Ardea built-in fault detection: • PE crash failures by heartbeat messages • Network link failures detected and handled as PE failures • Software module crashes detected locally by a module execution monitors • Critical output modules detect missed deadlines • Fault Handling: masking, reconfiguration, or fail-stop
Ardea Runtime Behavior • Supporting mobile software modules (moving object code, scheduling/unscheduling, and data re-routing) • Tracking resource availability • Finding Configurations • Deploying Resources • Manage state data variables
Memory Loader: copies code into program memory Scheduler: starts and stops execution of modules Network Interface: handlespublic data variables (data routing) Processing Elements (PEs)
Mobility and Data Routing • Module I/O data passed through mailboxes • Data routing transparent to modules • Starting, stopping of modules Figure 26 - Page 61
Starting, stopping, and restarting modules Restarting requires: State Preservation Unprocessed data preservation Scheduling and Unscheduling
Two configuration finding algorithms: High-fidelity is (NP-hard) to find high-utility configurations Low-fidelity (fast) to insure running of critical services Response based on criticality of detected/reported fault Deploying configurations starting from sensor side of a DG Reconfiguration Policies
Ardea Benefits • More flexible fault tolerance at reduced cost • Ability to analyze reconfigurable architectures using DGs • Simplified debugging and maintenance • Runtime system testing • Graceful upgrade and repair • Reduction of design errors • Software reusability
Current Work • Applying techniques to a UAV for AUVSI student UAV competition. • Avionics system for BIG BLUE IV. • “READY” UAV Project • Expand bus via wireless link to the ground: • Rapid prototyping • Minimize risk to hardware • Flexible Reconfiguration
Conclusion • Graceful degradation in distributed embedded system is a new research area currently focusing on either abstract modeling or on non-real-time/non-critical systems • Ardea provides a structured framework for the design and implementation of real-time systems • Dependency graphs for application software specification • A software layer supporting relocatable software modules, fault recognition, and handling