210 likes | 345 Views
F ault D etection in a HW /SW CoDesign Environment. Prepared by A. Gaye Soyk ö k. Outline. Introduction System Specification Fault model Some terminology Methodology Analysis Reliable communication HW/SW Partitioning. Introduction.
E N D
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök
Outline • Introduction • System Specification • Fault model • Some terminology • Methodology Analysis • Reliable communication • HW/SW Partitioning
Introduction • System reliability aspects are generally considered to the end of the design process, at low abstraction levels • Working at low abstraction levels introduces more overhead • Not all systems can be considered at low levels • It is better to handle fault detection at higher levels • It is better to asses if fault detection should be done in HW or SW for system performance
Introduction • At system level several parameters are considered and an alternative design is chosen among several alternatives • Time constraints • Power consumption • Testability • Area
Introduction • Fault detection facilities are introduced at system level • HW/SW binding of components is affected • System Specification: which parts are critical and need fault detection • Design methodologies: how these detection facilities are applied eitherin HW or SW • HW/SW partitioning: which parts are in SW, which are in HW. Guided by methodologies
System Specification • Language must support .. User should eb able to specify which sections require reliability aspects For ex: SystemC or OCCAM • Architecture; CPU(dsp or general purpose), Coprocessors, (ASIC or FPGA)
FAULT MODEL • Single Functional Failure • Any number of physical faults causes a functional model to perform incorrectly • HW is faulty, software is affected by hardware • CPU, communication channels, one of Co processors , memory may fail • Module failure is detected before any other fails • Temporal, architectural and informational redundancy is adopted
Some Terminology • Nominal :original system function elements • Checking: redundant elements for fault detection • Checker: element to compare checking and nominal • Each of these elements can be independently implemented in either HW or SW
HW or SW • Nominal SW,Checker SW, Checking SW Checking and checker are either executed by system processor or a dedicated processor Ex: Self checking SW, Assertions, Dual_processor and VLIW
HW or SW (Cont’d) • Nominal SW, checker HW and checking SW Interface for functional Redundancy check, VLIW with hardware,Dma checker • Nominal SW, checker HW and checking HW CED solutions are implemented totally in HW, EX: Dynamically configurable checker
HW or SW (Cont’d) • Nominal HW, Checker HW, Checking HW Classical Approach. Ex: Duplication , TSC devices
Methodologies Analysis - Concepts • Number and type of processing elements • Whether special architecture is necessary • Synchronization issues between processing elements • Allocation of checker memory space • Checker structure and complexity • Selection of a checker methodolgy to raise errors in case of mismatches
Methodologies Analysis - Metrics • Detection latency: the time between the instant an error occurs and the instance it is detected • Coverage: how many of the existing faults can be detected • Performance degradation: overhead caused by fault detection facilities compared to nominal functions
Methodologies Analysis – Metrics(Cont’d) • Material cost: cost of physical components • Design Cost: effort needed to design the system
Reliable Communication • Apart from data processing communication needs to be reliable • Hardware redundancy ; lines duplication • Information redundancy; data encoding • Best effective when data encoding is used when SW is involved and hardware sections employ dedicated lines(dublicated, encoded)
HW/SW Partitioning • After systems is specified, methodologies has been assessed, different alternatives have been produced with cost functions partitioning step takes place. • Evaluate cost functions, evaluate constraints of the user • Reliability aspects make it more complex Make partitioning in two stages!
HW/SW Partitioning(Cont’d) • First level: classical aspects and functions are taken into account • Second level:given the first solution reliability aspects are introduced and a solution between solution set that has the best trade off and that satisfies the first constraints is chosen. • If no reliability constraints is given second level is not carried
HW/SW Partitioning(Cont’d) • If specific architecture is required for reliability (for example dual processor) fist level benefits from earlier partitioning solutions • A solution may not exist after reliability constraints are introduced and first level may need to be repeated
HW/SW Partitioning(Cont’d) • Reliability constraints may be which druve the second stage • Hard, ex: % 100 fault coverage • Soft, ex: any fault coverage • Parameters considered • Fault coverage • Performance degradation • Detection latency • Area overhead
Conclusion • Design for reliability has been merged into HW/SW codesign process resulting in a final design that has on-line fault detection properties • Future work is introducing fault tolerancy into HW/SW codesign process