100 likes | 193 Views
Hardware Testing on the Level of Tasks. IMCSIT 2009, Mrągowo Poland Thomas Kägi, Igor Schagaev, Jürg Gutknecht. Generalised algorithm of fault tolerance (GAFT). Checking on different implementation levels. Typical timing constraints microseconds for the instruction level
E N D
Hardware Testing on the Level of Tasks IMCSIT 2009, Mrągowo Poland Thomas Kägi, Igor Schagaev, Jürg Gutknecht
Generalised algorithm of fault tolerance (GAFT) D-INFK/Native Systems Group/Thomas Kaegi-Trachsel
Checking on different implementation levels • Typical timing constraints • microseconds for the instruction level • milliseconds for the procedure level • hundreds of milliseconds for the module level • seconds to tens of seconds at the task level • tens of seconds to minutes at the system level. • “Good” fault tolerant system tolerates the majority of all possible malfunctions within the instruction level (transparent to software) • Permanent faults and some malfunctions might be detected at a higher level D-INFK/Native Systems Group/Thomas Kaegi-Trachsel
Checking process on the level of tasks (1/2) Basic version T P T Short version P T T = Test P = Task D-INFK/Native Systems Group/Thomas Kaegi-Trachsel
Checking process on the level of tasks • To keep the system responsive, do not check the whole hardware at once, perform test per processor • Perform only one test simultaneously • Selective testing according to the principle of growing core, taking dependencies into account • During a diagnostic cycle, every hardware component is be tested once • Asynchronouse testing • Preferable for time critical tasks • Not optimal cpu utilisation • Synchronous testing • Preferable for long running tasks • Higher overhead D-INFK/Native Systems Group/Thomas Kaegi-Trachsel
Checking process • Synchronous diagnosis • Unloading of the currently running task state from some hardware parts (e.g. RAM) (Tu) • Loading and initialization of the diagnostic routines • Execution of the diagnostic process • Unloading of the diagnostic routines and performing further actions in case of faults. If high priority interrupts occur during testing, some temporary data might be needed to continue the testing after processing the interrupt • Reloading of the user task and continue processing (Tr) • Asynchronous diagnosis • Point 2 to 4 of synchronous testing (Ta) • Goal: Optimal mix between synchronous and asynchronous mode D-INFK/Native Systems Group/Thomas Kaegi-Trachsel
System model • System • Multi processor system with a set of U identical processors • Tasks • No task preemption • No task dependencies (time, information, control) • All tasks are ready at boot up time • Tad are constant • Use Cases • Tc and all task completion times are known. • Tc is known but the task completion times are not. • Tc is unlimited and the task completion times are not known D-INFK/Native Systems Group/Thomas Kaegi-Trachsel
Analysis of the checking process • Tasks are sorted in increasing completion time • For tasks with ti > Tc the testing is performed in synchronous mode ti Tad ti+1 – ti >= Tad Asynch Tad ti+1 Tad ti Asynch / Synch ti+1 Tu + Tr ti + Tad – (Tu + Tr) < ti+1 < ti + Tad ti Tad ti < ti+1 + (Tu + Tr ) < ti + Tad New Task ti+1 Tu + Tr D-INFK/Native Systems Group/Thomas Kaegi-Trachsel
Leveraged system model • System • Multi processor system with a set of U identical processors • Tasks • No task preemption • Tasks can have dependencies (time, information and control) • Task i has ready time tri and processing time tpi • Variable task check time tadi • Tasks are sorted according to their completion time ti = tri + tpi D-INFK/Native Systems Group/Thomas Kaegi-Trachsel
Modified checking algorithm • Tasks are sorted in increasing completion time • For tasks with ti > Tc the testing is performed in synchronous mode t1 tad1 tad2 t2 Use free slot for asynch check t3 tad3 T0 Tu tsd0 Tr Continuously running task D-INFK/Native Systems Group/Thomas Kaegi-Trachsel