180 likes | 346 Views
On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme. Vienna University of Technology Embedded Computing Systems Group {fuchs, fuegger, steininger}@ecs.tuwien.ac.at. Gottfried Fuchs, Matthias Függer and Andreas Steininger. Outline.
E N D
On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme Vienna University of Technology Embedded Computing Systems Group {fuchs, fuegger, steininger}@ecs.tuwien.ac.at Gottfried Fuchs, Matthias Függer and Andreas Steininger
Outline • Asynchronous fault-tolerant algorithm • Investigate its susceptibility to metastability • In this context: study Sutherland’s micropipeline 2
Clocking in SoCs DARTS GALS synchronousSoC • (+) no single point of failure • (-) no common time across chip • (+) no single point of failure • (+) common time across chip (< small # of ticks) (-) single point of failure (Seifert et al.) (+) common time across chip (< 1 tick)
SoC with Common Time q’s local clock domain tick(3) tick(4) tick(5) q p p tick(2) tick(3) tick(4) tick(5) q π(t) = 2 #ticks(Δ) = 3 precision: at any t,π(t) bounded accuracy: l(Δ) < #ticks in any Δ < u(Δ) Common time eases solving other problems (replica determinism, …). 4
DARTS Hardware Implementation Common time property proved in [EDCC06, PODC09]. • Initially: • send tick(0) to all; clock:= 0; • If received tick(m) from at least f+1 remote nodes and m > clock: • send tick(clock+1),…, tick(m) to all; clock:= m; • If received tick(m) from at least 2f+1 remote nodes and m >= clock: • send tick(m+1) to all; clock:= m+1; 5
DARTS Hardware Implementation Common time property proved in [EDCC06, PODC09]. But: Proofs cover digital behavior, only. What about metastability (during non-normal operation)? 6
Potential for metastability (1) • TG-Alg has • (a) stable state • (b) fault non-closed (unrestricted) environment • (no stability condition as in QDI) • exists a malicious input pulse. Make sure metastability does not propagate across ECR boundary 7
Existence of metastability barrier? (Sutherland) 8
Does a micropipeline “synchronize”? in(t) out(t) maliciousout (t) tE1 tE2 Critical pulse window size (2 stages) = tE2 -tE1 9
Does a micropipeline “synchronize”? in(t) out(t) maliciousout (t) Critical pulse window size (4 stages) 10
Metastability decay in a C-Element (1) Latch C-Element Model Model Decay towards LO/HI MTBU formula Do equivalent formulas exist? 11
Metastability decay in a C-Element (2) a,b inputs (b = armed)z outputx feedback a(t),f(a,b,x)(t) For t > tE :Consider homogenous solution f(a,b,x)(t) = x(t) x0 tE 12
Metastability decay in a C-Element (2) Near metastability point: with assumption x0= “midway” yields Remember the latch: strong indication for synchronizing behavior 13
Simulation Setup 4 stage pipeline, MATLABs stiff ODE parameters: CMOS 180nm,but G = 1.66 (numeric resolution) maliciousout (t) choose Tmaxcorr = 3Tnom 14
Simulation Results (1) Dependence on RC constants critical window critical window size approx. linear dependence only 15
Simulation Results (2) Dependence on #stages critical window critical window size ~10-1/stage 16
Simulation Results (3) Dependence on G ~10-7/1 In case of DARTS Simulation indicates that critical pulse window size < 1fs. 17
Conclusions • Example for fault-tolerant asynchronous algorithm: DARTS. • Identified micropipeline as metastability barrier. • Characterized its synchronizing behavior. Open research: • Refined C-Element models (yield results for larger G). • Extend analysis to incorporate masking effects and calculate metastability upset probability. 18