160 likes | 330 Views
Fault-Detection Capability Analysis of a Hardware-Scheduler IP-Core in Electromagnetic Interference Environment. J. Tarrillo 1 , L. Bolzani 1 , F. Vargas 1 , E. Gatti 2 , F. Hernandez 3 , L. Fraigi 2 1 Electrical Engineering Dept., Catholic University – PUCRS. Porto Alegre, Brazil.
E N D
Fault-Detection Capability Analysis of a Hardware-Scheduler IP-Core in Electromagnetic Interference Environment J. Tarrillo1, L. Bolzani1, F. Vargas1, E. Gatti2, F. Hernandez3, L. Fraigi2 1 Electrical Engineering Dept., Catholic University – PUCRS. Porto Alegre, Brazil. 2 Inst. Nacional de Tecnologia Industrial (INTI). Buenos Aires, Argentina. 3 Universidad ORT. Montevideo, Uruguay. vargas@computer.org Catholic University PUCRS
Motivation Nowadays, safety-critical embedded systems support real-time (RT) applications that have to respect strict timing constraints. They have to provide logicallyand temporally correct results ! The high complexity of these systems requires the adoption of Real-Time Operating Systems (RTOS) that manage task switching process, concurrency between tasks, memory, time as well as interrupts. vargas@computer.org
Understanding the Problem … The increasing hostility of the electromagnetic environment caused by the widespread adoption of electronics and in particular wireless technologies, represents a huge challenge for the reliability of RT embedded systems. Electromagnetic interference (EMI) may induce Power Supply Disturbances (PSD) that can generate transient faults. These faults can affect not only the applications running on embedded systems but also the RTOS executing the application code, by causing scheduling dysfunctions that could lead to incorrect system behavior. vargas@computer.org
Understanding the Problem … Several solutions have been proposed. However, they provide fault tolerance only at the application level and do NOT consider faults affecting theRTOS that propagate to application tasks. • e.g.: about 34% of the faults injected in processor’s registers led to scheduling dysfunctions: • - 44% of these dysfunctions led to system crashes, • - 34% caused RTproblems and • - 22% generated incorrect outputs (propagate to system outputs). If not detected at the RTOS-level, these faults escape detection by conventional (app-level) techniques as well ! vargas@computer.org
Goal In this context… We propose a Hardware-based Scheduler (Hw-S) IP core to improve the robustness of embedded systems based on RTOS. the Hw-S targets faults that are NOTdetected by thenative structurespresent in the RTOS kernel. vargas@computer.org
Summary • The Proposed Approach • 2. Practical Experiments • 3. Discussion: The Benefits • 4. Conclusions vargas@computer.org
Events: Tick, interruption, ... (Reference for Switching Task Context ) Memory Addresses accessed by the processor. Hw-S identifies the current task under execution and correlates it with the information stored in an Address Table generated during the compilation process. 1. The Proposed Approach Embedded System Block diagram of the target embedded system vargas@computer.org
In charge of identifying the task under execution based on the addresses accessed by the CPU and on the information stored in an Address Table generated during the compilation process. Error Indication to System Level • Implements the scheduling algorithm based on the RTOS kernel and provides fault detection according to: • the task in execution, • the analysis of the tl, and • the events (interrupts) that can influence the RT-system. Based on the tick and on any other event (interrupts), it is in charge of defining the Time Limit (tl) for the processor to execute each task, as well as detecting the events that can possibly interrupt the task in execution. 1. The Proposed Approach vargas@computer.org Block diagram of the Hw-S
Time for ContextSwitching (Δ time, proportional to the number and complexity of resources used by the RTOS) External Event Next task recover from the execution queue Current task retirement into the execution queue Time Limit for Switching Context 1. The Proposed Approach Context Switch and Time Limit. vargas@computer.org
1. The Proposed Approach Regarding the fault detection capability, the Hw-S targets two types of faults: Sequence error (E_seq): occurs at the end of the Time Limit, tl, by noting that the current task does not represent the expected one according to the task’s execution flow. Time error (E_time): occurs when a task switching process takes place in between two consecutive context switching events (e.g., two consecutive ticks) thus, violating the time constraints associated to the real-time system. vargas@computer.org
2. Practical Experiments • Case study: • Von Neumann 32-bit RISC Plasma microprocessor running a RTOS (opencores.org). • Plasma’s instruction set compatible to MIPS architecture. • We developed and validated three benchmarks that exploit different services offered by the Plasma’s RTOS: Tasks T1, T2 and T3 access and update the value of three different global variables. Tasks T1 and T2 communicate by message queue. T1 sends a value to the queue and T2 reads this value. Task T3 writes a value into a global variable. Tasks T1, T2 and T3 access a global variable which has been protected by mutual exclusion semaphore (MUTEX). vargas@computer.org
Power Supplies Temp Sensor FPGA Flash SRAM 8051 Block Diagram 2. Practical Experiments Test Side Test Side Glue Logic Side Remaining Glue Logic Side vargas@computer.org Test board designed for IEC 62.132-2 and 61.004-29 electromagnetic susceptibility analysis
GTEM Cell Test Host Computer RF Noise Generator and Amplifier Power-Supply Noise Generator Board Test Board and Shielding Box 1.2 volts 1.15 volts 2. Practical Experiments • Test Conditions: • Freq. range: 150 KHz – 3 GHz • Field range: 10 – 200 V/m • Signal Modulation: AM 80% • Total time of exposition: 27 hours Fault injection environment 4.2 % of voltage dips Injected noise at the FPGA power bus (conducted EMI) vargas@computer.org
After 27 hours, # of erroneous outputs observed per benchmark: 65 Minimum fault latency Highest fault detection Coverage of faults that propagated to outputs 2. Practical Experiments Summary of the obtained results vargas@computer.org
Time_Errors (CPU switched to another task between two consecutive ticks) RTOS lost information associated to the “next thread”, so preventing the CPU from switching to the next task in the execution queue RTOS lost information associated to the “next thread”, so preventing the CPU from switching to the next task in the execution queue Sequence_Errors (CPU executed an unexpected task from the Task Execution Queue) RTOS lost “semaphore information”, so preventing the CPU from continuing the proper execution of the tasks Percentage of assert() send by the RTOS Percentage of E_seq and E_time detected by the Hw-S. 2. Practical Experiments After inspection … Migrate to HW the weakest reliability points of the RTOS vargas@computer.org
4. Final Conclusions • We presented a Hardware-based Scheduler (Hw-S) IP core to improve the robustness of embedded systems based on RTOS • The Hw-S targets faults: scheduling dysfunctions that could lead to incorrect system behavior • These faults are NOTdetected by thenative structurespresent in the RTOS kernel • The IP core is attached to the processor bus to monitor tasks execution flow • Practical experiments indicate the technique is effective to increase fault detection coverage provided by the RTOS-native structures. vargas@computer.org