190 likes | 378 Views
Algorithm Level RE-computing with Shifted Operands - A Register Transfer Level Concurrent Error Detection Technique. Kaijie Wu and Ramesh Karri CAD Lab Department of Electrical Engineering Polytechnic University (kwu03@utopia.poly.edu,ramesh@india.poly.edu). Outline.
E N D
Algorithm Level RE-computing with Shifted Operands -A Register Transfer Level Concurrent Error Detection Technique Kaijie Wu and Ramesh Karri CAD Lab Department of Electrical Engineering Polytechnic University (kwu03@utopia.poly.edu,ramesh@india.poly.edu)
Outline • Review time redundancy based CED techniques • Describe ARESO • operation • checking ratio • benefits / drawbacks • Integrate pipelining with ARESO • Summary of ARESO overhead • Examples and Experimental Results • Conclusion
1. Perform basic computation 0 0 X3 Y3 X2 Y2 X1 Y1 X0 Y0 +4 +3 +2 +1 +0 result 1 Z3 Z4 Z2 Z1 Z0 2. Repeat computation with 1-bit shifted operands X3 Y3 X2 Y2 X1 Y1 X0 Y0 0 0 +4 +3 +2 +1 +0 result 2 Z3 0 Z1 Z0 Z2 3. Compare results result 1 result 2 c Error RE Computing with Shifted Operands (RESO)
Fault detection capability of RESO • With k-bit shift, RESO can detect errors in • all bit-wise logical operations when failures are confined to k adjacent bit-slices. • arithmetic operations in a ripple-carry adder and carry-lookahead adder when failures are confined to k-1 adjacent bit-slices, k>1. • arithmetic operations in a group carry look ahead adder when failures are confined to a group. Each group i consists of a k-1 bit adder and circuits for group-carry generate Gi, group-carry propagate Pi, and group carry-in Ci. • Up to k errors in a bit-slice of an array multiplier can be detected by shifting at most Log2(2k+1) bits in one of the operands.
+ + + * + * + + + + + + + + C C C C C C C + + + + + + + + + + * C C + + + + * * + + * C * C (a) (b) (c) (d) • No CED • (a) Example CDFG • Logic Level CED • (b) Duplication • (c) RESO, RERO, REDWC etc.. • Algorithm Level CED • (d) Algorithm level time redundancy Comparison
Algorithm Level Re-Computing with Shifted Operands (ARESO) • Does not use fault tolerant logic operators • Performs checking operations at the Register Transfer Level • Supports hardware overhead vs. performance penalty vs. error detection latency trade-offs
RTL Data path Operation of ARESO Indicator input input shift register shift register shift register C Output Error
ARESO - Checking Ratio (R) L Sh Input R Input R Input samples L Sh Input R R=1 check all results !!! Input R Input 2 Input 1 time L = # of clock cycles per iteration
ARESO features • Good points • # of comparison(s) are reduced • By increasing checking ratio, time overhead can be reduced • Compared to straightforward duplication, area overhead is reduced • Bad points • Large detection latency, (R+1) L
Integrating ARESO with Pipelining • Reduces Error Detection Latency • If L=18, R=2 • Detection Latency = 54 cycles for basic ARESO (R+1)L • Detection Latency = 30 cycles for pipeline ARESO with initiation interval I = 6 (RIARESO)+L … … Detection Latency L Detection Latency Shifted Input 2 Shifted Input 2 IARESO Input 2 Input 2 L Input 1 Input 1 time 0 36 54 0 6 12 18 30 18 Basic ARESO Pipeline ARESO
ARESO -Throughput • Throughput: # of results that come from non-shifted inputs per clock cycle (= ) • To maintain this throughput, the initiation interval of the pipelined ARESO design should be IARESO= … IARESO Shifted Input 2 I … Input 2 I Input 2 Shifted Input 1 Input 1 Input 1 0 12 0 6 12 18 30 30 pipeline design w/o ARESO Pipeline design w ARESO (R = 1)
Error detection capability of ARESO • All RESO detectable permanent faults • The transient faults detection capability varies with R (the checking ratio) and D (the # of data outputs that will be affected) • when 1 R D, 100 % RESO detectable faults • when D<R, 100 x (D / R) % RESO detectable faults
FIR Filter Example - overhead 50 ns clock FIR I=12, L=23 ARESO-1 FIR IARESO=6, R=1,L=24 ARESO-2 FIR IARESO =8, R=2, L=24 Multipliers (8×814) (9×817) (10 ×1016) (10 ×1017) (10 ×1019) (10 ×1016) (10 ×1019) Adders 2 (19×1919) 3 (21×2121) 2 (21×2121) Register bits 419 963 750 Combinational area (unit cells) 4051 6960 71.8% 5483 35.3% Sequential area (unit cells) 4983 11506 130.9% 8635 73.3% Total area (unit cells) 9034 18466 104.4% 14118 56.3% Detection latency (ns) - (6+24) ×50 = 1500 (2×8+24)×50= 2000 30.8% reduction in area at the expense of 33.3% increase in error detection latency.
*17 *16 *15 *14 +16 *13 *12 +14 +15 *11 +13 *10 +12 +11 *9 *8 +9 +10 +8 +7 *7 +6 *6 +5 *5 +4 read inputs test checking ratio counter *4 +3 *3 +2 *2 +1 *1 FIR Filter Example - Schedule • 17 multiplications, 16 additions • ARESO with • checking ratio = 2 • IARESO=8 clock cycles • L=24 clock cycles • 50 ns clock cycle • ARESO constraints were incorporated into Synopsys BC synthesis scripts • Two 21×2121 adders • One 10 ×1016 and One 10 ×1019 multipliers • Detection latency of 2000 ns
Multi-cycle ops (30 ns clock) FIR I=12 L=36 ARESO-1 FIR IARESO =6, R=1,L=36 ARESO-3 FIR IARESO =9, R=3,L=36 Combinational area (unit cells) 5318 8666 63.0% 6868 29.1% Sequential area (unit cells) 7898 14410 82.5% 10637 34.7% Total area (unit cells) 13216 23076 74.6% 17505 32.5% Detection latency (ns) - (6+36)×30=1260 (3×9+36)×30=1890 FIR using multi-cycle operations 31.8% reduction in area at the expense of 50% increase in error detection latency.
Combinational area (unit cells) 4186 6912 65.1% 5491 31.2% Sequential area (unit cells) 4983 11044 121.6% 8910 78.8% Total area (unit cells) 9169 17956 82.5% 14401 57.1% Detection latency (ns) - (6+24)×100= 3000 (2×8+24)×100= 4000 FIR using chained operations Op. chaining (100 ns clock) FIR I=12, L=24 ARESO-1 FIR IARESO =6, R=1,L=24 ARESO-2 FIR IARESO =8,R=2,L=24 24.7% reduction in area at the expense of 33.3% increase in error detection latency.
Conclusions • Compared to straightforward duplication, area overhead of ARESO-R designs are in the range 30%-100%. • The detection latency of ARESO-R increases with checking ratio R. • For a given throughput, the area overhead decreases as the checking ratio R increases. • ARESO constraints incorporated into Synopsys BC.