240 likes | 382 Views
Placement and Timing for FPGAs Considering Variations. Yan Lin 1 , Mike Hutton 2 and Lei He 1 1 EE Department, UCLA 2 Altera Corporation, San Jose. Outline. Preliminaries and Motivation Timing with Guard-banding/Speed-binning Stochastic Placement Experimental Results
E N D
Placement and Timing for FPGAs Considering Variations Yan Lin1, Mike Hutton2 and Lei He1 1EE Department, UCLA 2Altera Corporation, San Jose
Outline • Preliminaries and Motivation • Timing with Guard-banding/Speed-binning • Stochastic Placement • Experimental Results • Conclusions and Discussions
Background • Process variations • more and more significant in nanometer technology • affect timing and power in both ASICs and FPGAs • Delay with variations • Variation sources • Threshold voltage (Vth) and effective channel length (Leff) • Independent Gaussians for global/local variations • First order canonical form • Related work • FPGA device and architecture evaluation with process variations [Wong et al, ICCAD’05] • SSTA [Chang et al, ICCAD’03] [Viseswariah et al, DAC’04] • Statistical criticality analysis [Viseswariah et al, DAC’04] [Li et al, ICCAD’05] [Xiong et al, TAU’06] • Statistical gate sizing for ASICs [Guthaus et al, ICCAD’05] [Sinha et al, ICCAD’05]
Motivation • STA is inaccurate with variation • Slack ignores near criticality • Near-critical paths may be statistically timing critical • Deterministic timing-driven placer (e.g. T-VPlace in VPR) • Based on STA • Optimize for static critical path • May not optimize timing with variation • Stochastic placer is needed with variations • Same placement for one application across chips
Pre-routing Interconnect Uncertaintyvs. Process Variation in Placement • Clearly, process variation leads to a more significant delay variance in placement stage • Therefore, only consider process variation for placement • Existing timing-driven placer • Leverages timing slack in STA • With interconnect delay estimated • May incur uncertainty along with process variation
Outline • Preliminaries and Motivation • Timing with Guard-banding/Speed-binning • Stochastic Placement • Experimental Results • Conclusions and Discussions
Uniqueness for Timing in FPGAs • FPGAs vs. ASICs • Similarity • Susceptible to process variations • Disadvantages • Critical paths unknown at test time • Same timing model to be applied to unknown applications at unknown clock frequency and varied conditions • Guard-banded timing model can be arbitrarily conservative or aggressive • Advantages • Long switching paths dampen (average out) local variation • Binned for speed-grades to isolate global variation • Can be programmed repeatedly and differently during timing chip-test
Timing with Guard-banding • A guard-band is applied for individual node to model uncertainty in STA • A constant guard-banded delay is µ+cσ • µand σare the nominal delay and standard deviation, respectively • c is constant for all circuit elements • Guard-band cost (Tgrd/Tnorm)-1 • Tgrd : critical path delay in STA w/ guard-banding • Tnorm: critical path delay in STA w/ nominal timing model • Pessimistic/optimistic for designs with longer/shorter critical path • Actual timing yield analyzed by SSTA
Timing with Speed-binning • Test and eliminate local variation by testing multiple similar paths across the test chip • Model global variation Gaussians ΔXi as a single ΔGa • Speed-binning = Categorizing ΔGa • All chips fell into the same bin share the same guard-banded timing model • e.g., µ-σg /µ+σg/µ+3σg for fast/medium/slow bin • STA for the circuit delay Tbin for each bin
Yield loss due to unknown critical paths • Timing yield analysis for a bin • circuit delay Tµ+σTgΔGa+σTlΔRa • bin k [Glow(k), Gup(k) ] • cut-off delay γTbin(k) • timing yield for bin k is • The overall timing yield is Yield Analysis with Speed-binning • Yield loss due to ignored local variation
Outline • Preliminaries and Motivation • Timing with Guard-banding/Speed-binning • Stochastic Placement • Experimental Results • Conclusions and Discussions
Wiring cost • Timing cost • for a connection • for a placement solution • Overall cost Timing-Driven Placement T-VPlace [Marquardt et al, FPGA 2000] • Simulated annealing based placement • Both wiring and timing are considered in the cost function • STA is performed at each annealing temperature to update critical path delay and slack
Using statistical criticality instead of static criticality in cost function • Statistical criticality for an edge/node is the probability that this edge/node is statistically timing critical in SSTA • Statistical criticality exponent θ • Static criticality is based on slack and the longest path delay in STA Stochastic Placement ST-VPlace • Main differences between ST-VPlace and T-VPlace • Estimate delay matrix in canonical form instead of just nominal delay matrix • Used in SSTA for statistical timing cost during placement • Perform SSTA instead of STA at each temperature in simulated annealing framework
Outline • Preliminaries and Motivation • Timing with Guard-banding/Speed-binning • Stochastic Placement • Experimental Results • Conclusions and Discussions
Experimental Settings • Variation and device setting • 10% as 3 sigma for global and local variation in Vth and Leff at IRTS 65nm technology node • Min-ED device setting • Vdd=0.9v Vth=0.3v [Wong et al, ICCAD’05] • Architecture similar to Altera’s StratixTM • Island style FPGA architecture • cluster size 10 and LUT size 4 • 60% length-4 and 40% length-8 wire in interconnects • 1.2X routing channel width obtained by T-VPlace • Yield loss in failed parts per 10K parts (pp10K) • Evaluated using MCNC and QUIP designs
Cost Function Tuning • Perform ST-VPlace and SSTA to obtain mean delay and standard deviation over all designs for each statistical criticality exponent θ • θ=0.3leads to the smallest mean and deviation the highest timing yield
T-VPlace vs. ST-VPlace • Some correlation between mean delay and deviation • ST-VPlace achieves • smaller mean delay for all designs • smaller variance for most designs • a higher timing yield
Statistical criticality may vary significantly with similar static one Statistical Criticality vs. Static Criticality • Statistic criticality vs. static criticality • Statistical criticality does not increase monotonically with static one • ST-VPlace considers statistical criticality explicitly • Optimizes near-critical paths under variations • Leads to a higher timing yield
Impact on Path-length Distribution • Path-length distribution in ST-VPlace is almost on top of that in T-VPlace • ST-VPlace reduces top 10% near-critical paths from 1.3% to 0.8% • Although has a larger nominal delay • But has a smaller mean and variance a higher timing yield
Variation (3sigma) global 5% local 5% Variation (3sigma) global 20% local 20% Variation (3sigma) global 10% local 10% 120% 10000.0 120% 10000.0 guard-band cost 120% 10000.0 guard-band cost T-Vplace yield lost T-Vplace yield lost 100% STV-Place yield lost 100% 1000.0 100% 1000.0 ST-VPlace yield lost 1000.0 80% 80% 80% 100.0 100.0 100.0 Yield loss (pp10k) Guard-band cost Guard-band cost Yield loss (pp10k) Yield loss (pp10k) Guard-band cost 60% 60% 60% 10.0 10.0 10.0 40% 40% 40% 1.0 1.0 1.0 20% 20% guard-band cost 20% T-Vplace yield lost ST-VPlace yield lost 0% 0.1 0% 0.1 0% 0.1 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 Guard-band factor Guard-band factor Guard-band factor Effect of Guard-banding • ST-VPlace obtains a higher timing yield under varied variations and guard-band factors • Larger gain with smaller variation
Variation (3sigma) global 5% local 5% Variation (3sigma) global 20% local 20% Variation (3sigma) global 10% local 10% 120% 10000.0 120% 10000.0 guard-band cost 120% 10000.0 guard-band cost T-Vplace yield lost T-Vplace yield lost 100% STV-Place yield lost 100% 1000.0 100% 1000.0 ST-VPlace yield lost 1000.0 80% 80% 80% 100.0 100.0 100.0 Yield loss (pp10k) Guard-band cost Guard-band cost Yield loss (pp10k) Yield loss (pp10k) Guard-band cost 60% 60% 60% 10.0 10.0 10.0 40% 40% 40% 1.0 1.0 1.0 20% 20% guard-band cost 20% T-Vplace yield lost ST-VPlace yield lost 0% 0.1 0% 0.1 0% 0.1 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 Guard-band factor Guard-band factor Guard-band factor • Yeild loss reduced by 3.4X with 3 sigma guard-banding under 10%/10% variations Effect of Guard-banding • ST-VPlace obtains a higher timing yield under varied variations and guard-band factors • Larger gain with smaller variation • Similar gain with varied local variation when no global variation is considered
Effect of Speed-binning • Fast/Medium/Slow = 40%/30%/29.999% • Discard the slowest 0.001% (0.1pp10K) chips • Tbin may be relaxed by γ for a higher timing yield • Yield loss due to local variation and unknown critical paths • ST-VPlace consistently achieves higher timing yield • Yield loss is reduced by 25X with γ=5%
Conclusions and Discussions • Conclusions • Quantified the effects of guard-banding and speed-binning with variations • Developed a novel stochastic placer • Evaluated with MCNC and QUIP designs, reduced yield loss by • 3.4X with guard-banding • 25X with speed-binning • Ongoing and future work • Extend timing models with spatial correlated variations • Develop stochastic physical synthesis algorithms, e.g., clustering, routing, re-timing