290 likes | 426 Views
Nostra-XTalk : A Predictive Framework for Accurate Static Timing Analysis in UDSM VLSI Circuits. Debasish Das, Ahmed Shebaita Yehea Ismail, Hai Zhou EECS, Northwestern Kip Killpack Strategic CAD Lab, Intel. Outline. Motivation and Previous Research Directed Search Mechanism
E N D
Nostra-XTalk : A Predictive Framework for Accurate Static Timing Analysis in UDSM VLSI Circuits Debasish Das, Ahmed Shebaita Yehea Ismail, Hai Zhou EECS, Northwestern Kip Killpack Strategic CAD Lab, Intel
Outline • Motivation and Previous Research • Directed Search Mechanism • Static Timer Algorithm • Experimental Setup • Conclusions and future work
Coupling Dominates • Coupling cap dominates interconnect parasitics • Graph shows ratio of coupling cap vs. ground cap of nets • Parasitics extracted from 65 nm logic block • Industrial Microprocessor design from Intel
Previous Research (Coupling Model) • Accurate computation of MCF needed • To model the effects of crosstalk • Analytical models for MCF computation proposed • Step transitions : (0,2) Sapatnekar et.al, ICCAD 2000 • Ramp Models : (-1,3) Kahng et.al, DAC 2000 Chen et.al, ICCAD 2000 • Exponential Models : (-1.885,3.885) Ghoneima et.al, ISCAS 2005 • Accurate models are applied to timing analysis • Extending Ramp model to Timing Analysis Das et. al, ICCD 2006
Previous Research (Timing Analysis) • Timing Analysis with coupling iterative • Iterative analysis with continuous models: Chen et.al ICCAD 2000 • Iterative analysis with discrete models: Sapatnekar et.al TCAD 2000, Chen et.al ICCAD 2000, Arunachalam et.al DAC 2000 • Circuit and coupling structure explored to speed up iterative analysis: Das et. al ICCD 2006
Issues in Previous Research • Analytical models derived MCF based on • Output delay windows on victim and aggressor nets • Output slew windows on victim and aggressor nets • Correlation between input timing windows and MCF ignored • Correlation between input slew windows and MCF ignored • Such assumptions may lead to pessimistic MCF computation
Motivational Example • Rise Delay Window at I1 • [2,4] • Fall Delay Window at I2 • [3.5,4.5] • Static timing assumption on slews • Max victim input slew 0.6 • Min aggressor input slew 0.8 • Consider 2 timing events from [2,4] (arrival time, slew) • T1 = (3.9,0.6) • T2 = (4.0,0.6) G1 [2,4] O1 I1 [3.5,4.5] I2 O2 G2
Motivational Example • MCF due to T1 = 2.5 • MCF due to T2 = 2.4 • Previous approaches will consider MCF as 2.5 • Update [2,4] with MCF 2.5 • Ideally • Compute Delay push-out on T1 due to MCF 2.5: T1po • Compute Delay push-out on T2 due to MCF 2.4: T2po • Maximum bound of output window • max(T1po,T2po) G1 [2,4] O1 I1 [3.5,4.5] I2 O2 G2
Nostra-XTalk • Salient Features • Directed Search Mechanism • Search for input timing event • Results in worst/best delay push-out • Use of input timing windows on victim and aggressor • Accurate gate delay model employed • To take into account non-linearity of devices • Iterative static timer • Using Directed Search on a victim cluster • Collection of all aggressor nets connected to victim net
Outline • Motivation and Previous Research • Directed Search Approach • Static Timer Algorithm • Experimental Setup • Conclusions and future work
Circuit Model N1 NAND • Rise/Fall-Delay-Window : (Dil, Dih) • Rise/Fall-Slew-Window : (sil,sih) • Associated nodes with coupling edge : N1 and N2 CC CC CC N3 NAND NAND N2 Coupling Edge Rise Arc NAND I1 N1 Fall Arc I2
Circuit Model for Directed Search • Enumerate arcs on drivers • Both victim and aggressor net arc2 arc1 ODv,OSv ODv,OSv N1 IDv,ISv IDv,ISv arc1 arc2 Victim cc cc IDa,ISa IDa,ISa ODa,OSa arc3 ODa,OSa arc3 arc1 ODv,OSv arc2 arc3 arc4 ODv,OSv IDv,ISv IDv,ISv Aggressor cc cc N2 IDa,ISa IDa,ISa ODa,OSa ODa,OSa arc3 arc4 Victim Aggressor Input delay (IDv)=[Divl, Divh] Input slew (ISv)=[sivl, sivh] output delay (ODv) D0v=[Dovl, Dovh] output slew (OSv) tvs=[sovl, sovh] Input delay (IDa) =[Dial, Diah] Input slew (IDa)=[sial, siah] output delay (ODa), D0a =[Doal, Doah] output slew (ODa), tas =[soal, soah]
Circuit Model for Directed Search (contd.) • Enumerate arcs on drivers • Both victim and aggressor net arc1 arc2 ODv,OSv ODv,OSv N1 IDv,ISv IDv,ISv arc1 arc2 Victim cc cc IDa,ISa IDa,ISa ODa,OSa arc3 arc3 arc1 ODv,OSv arc2 arc3 arc4 ODv,OSv IDv,ISv IDv,ISv Aggressor cc cc N2 IDa,ISa IDa,ISa ODa,OSa arc3 arc4 • Apply Directed Search on 4 possibilities • Choose the one that results in worst delay push-out
Detailed Circuit Model for Directed Search arc1 ODv,OSv IDv,ISv arc1 ODv,OSv IDv,ISv Cv = Cg + (Victim MCF)Cc cc ODa,OSa IDa,ISa arc3 IDa,ISa Ca = Cg + (Aggressor MCF)Cc ODa,OSa arc3 Coupled Circuit Equivalent Circuit Aggressor Victim Input delay (IDv)=[Divl, Divh] Input slew (ISv)=[sivl, sivh] Input delay (IDa) =[Dial, Diah] Input slew (IDa)=[sial, siah] output delay (ODv) D0v=[Dovl, Dovh] output slew (OSv) tvs=[sovl, sovh] output delay (ODa), D0a =[Doal, Doah] output slew (ODa), tas =[soal, soah]
tas tvs sivh sivl siah sial cv ca (b) (a) tvd sivh sivl tad siah sial cv ca (c) (d) Gate Delay Model • We use logic gates from Faraday’s 90 nm cell libraries • Figure (a) shows tvs = f1(Cv,siv) • Figure (b) shows tas = f2(Ca,sia) • Figure (c) shows tvd = f3(Cv,siv) • Figure (d) shows tad = f4(Ca,sia) • Assuming linear dependance is acceptable • Output Waveform on victim • Wv = G(Div + tvd, tvs) • Output Waveform on aggressor • Wa = G(Dia + tad, tas)
Coupling Model (Das et. al, ICCD 2006) • Overlap ratio (k) computation • Overlap ratio is defined as the ratio of aggressor output waveform that overlap with victim threshold voltage • Use waveforms Wv and Wa to compute k
(a) Victim MCF = 1±2k, Aggressor MCF = (b) Victim MCF = 1±2k, Aggressor MCF = (c) Victim MCF = 1±2k, Aggressor MCF = (d) Victim MCF = 1±2k, Aggressor MCF = Coupling Model (Das et. al, ICCD 2006) • Use waveforms Wv and Wa to compute k
Worst Case Delay Computation • Linearly span the domain of k • Victim capacitance Cv = Cg + (1+2k)Cc • Compute tvd and tvs sivh sivl sivh sivl tvs tvd cv cv cg+(1+2k)cc cg+(1+2k)cc • Aggressor capacitance and slew are given by
Worst Case Delay Computation (contd.) • Using tas we obtain aggressor capacitance [c1,c2] • tad is calculated using c1 and c2 tas tad siah sial siah sial c1 c2 ca c1 c2 ca • Worst case delay computation produces • Construct Feasible set F • Choose one element from F which has worst tvd
Outline • Motivation and Previous Research • Directed Search Approach • Static Timer Algorithm • Experimental Setup • Conclusions and future work
Practical Application of Directed Search • Victim net coupled with more than one aggressors • We model our circuit as directed graph G = (V,E) • V : Gates in combinational circuit • E : F U C where F : Fan-out Edges C : Coupling Edges • We give the following definition V is the victim node while Aiare its aggressors
Worst Case Coupling Capacitance as fix-points • We define a switching point as an ordered pair of delay and slew • Following set gives all possible switching points in a k-cluster • Set formed by pairs of switching points (si,sj) is a totally ordered set • Directed Search between an victim and aggressor is order preserving transformation • Fix-point iteration to get worst case caps
Outline • Motivation and Previous Research • Directed Search Approach • Static Timer Algorithm • Experimental Setup • Conclusions and future work
Circuit Modeling • Experiments done on ISCAS85 benchmarks • Circuit modeled as DAG (Timing Graph) • Nodes in Timing Graph are Gates • Edges represent interconnect • Nodes are mapped to ASIC logic gates • Faraday 90 nm experimental tech library used • Delay tables are used : f( output load, input slew ) • Coupling graph generation • Extracted coupling capacitance values are used • Coupling graph is superimposed on timing graph • Each net is assumed to couple with 6 aggressors
Accuracy Enhancement Results • IST-(0,1,2) : Iterative static timer MCFs 0,1,2 (Sapatnekar et. al) • IST-DS : Proposed Iterative static timer • RT : Runtime TA : Cell Table Accesses • GA : Accuracy Gain, GT : Gain in Cell Table Accesses
Accuracy Enhancement Results • Hold time given by IST-(0,1,2) can be non-conservative • Accuracy Gain by proposed algorithm • Average : 25.59% • Highest gain C880 : 45.5% • Decrease in Cell Delay Table Lookup by proposed algorithm • Average : 40.1% • Maximum decrease c6288 : 64.8% • Decrease in cell delay is not reflected in runtime • Search has high complexity • Search should be used judiciously
Outline • Motivation and Previous Research • Directed Search Approach • Static Timer Algorithm • Experimental Setup • Conclusions and future work
Conclusions and future work • We present Nostra-XTalk • Directed Search Approach for accurate timing analysis • Iterative static timer using directed search • Directed Search is time consuming • Directed Search should be selectively applied • Can be used in a coupling partitioning based timer • As proposed by Das et. al ICCD 2006 • Directed Search can be applied on local clusters • Future Directions • Devise algorithms • Selectively apply Directed Search • Accurate as well as efficient analysis