510 likes | 708 Views
EDA Court: Hierarchical Construction and Timing Sign-off of SoCs. TAU 2013 Panel. Chip (h=0). … k. Chiplet (h=1). … k. … k. Core (h=2). … k. … k. … k. … k. Unit (h=3). … k. … k. Macro (h=4). The good side of hierarchy.
E N D
EDA Court: Hierarchical Construction and Timing Sign-off of SoCs TAU 2013 Panel
Chip (h=0) …k Chiplet (h=1) …k …k Core (h=2) …k …k …k …k Unit (h=3) …k …k Macro (h=4) The good side of hierarchy
Sweet spot:50B objects2M per macroFraction at top = 4e-54 levels of hierarchyPruning = 93% Impact of pruning h = 0 h = 1 h = 2 h = 3 h = 4 h = 5 Fraction of chip at top Unpruned fraction
The bad side of hierarchy • Accuracy? Pessimism? • Coupling noise? Functional noise? • Multiple interacting clocks? • Parasitics on boundary nets? • Is “context” required? If so, we cannot “shelve and re-use” macros • Construction flow? • Draconian methodology restrictions?
Chandu VisweswariahDistinguished EngineerIBM East Fishkill, NYchandu@us.ibm.com Igor KellerSenior ArchitectCadence San Jose, CAikeller@cadence.com Oleg LevitskySolutions ArchitectCadence San Jose, CAoleg@cadence.com Alex RubinSenior EngineerIBM San Jose, CArubin1@us.ibm.com Amit ShaligramPrincipal EngineerSTMicroelectronics Scottsdale, AZamit.shaligram@st.com Alexander SkourikhinEDA EngineerIntel Haifa, Israel alexander.skourikhin@intel.com Guntram Wolski Principal Engineer Cisco San Jose, CA gwolski@cisco.com Qiuyang WuSenior Staff Engineer Synopsys Hillsboro, ORqwu@synopsys.com Larry BrownDesign Center EngineerIBM San Jose, CAlmbrown@us.ibm.com
Charge 1: Hierarchical implementa-tion and hence hierarchical timing sign-off don’t have a future Plaintiff: Oleg Levitsky, Cadence Defendant: Qiuyang Wu, Synopsys
Evolution of design flow Prototype Implement Sign Off
Evolution of design flow Prototype Implement Sign Off
Evolution of design flow Prototype Blkn … Blk2 Blk1 Implement Sign Off
Evolution of design flow Quiz: Why hierarchical flow? Prototype Create more work for managers Contribute to real estate bubble Control time to market schedule? Blkn … Blk2 Blk1 Implement Blkn … Blk2 Blk1 Sign Off
Hierarchical design flow Prototype Complexity Blkn … Blk2 Blk1 Implement Hierarchical scalability Blkn … Blk2 Blk1 Sign Off
Hierarchical design flow Prototype Step 1 Step 2 Blkn … Blk2 Blk1 Implement … Step n Flow convergence is a key tapeout Blkn … Blk2 Blk1 Sign Off
Hierarchical design flow • Human factor: • Level of expertise • Human error • Lack of sleep Prototype Blkn … Blk2 Blk1 Implement Convergence Blkn … Blk2 Blk1 Sign Off • Technical challenges: • SI • Over the block routing • Useful skew distribution • CPPR modeling • Power budgeting • Channeless designs • …
Hierarchical design flow Prototype Complexity Blkn … Blk2 Blk1 Implement Convergence Hierarchical scalability Failed to control TTM Blkn … Blk2 Blk1 Sign Off
Charge 1: Hierarchical implementa-tion and hence hierarchical timing sign-off don’t have a future Plaintiff: Oleg Levitsky, Cadence Defendant: Qiuyang Wu, Synopsys
Hierarchical Design and Timing Closure is the Only Way to Have a Future Qiuyang Wu Sr. Staff Engineer, Synopsys Inc. March 2013
Hierarchical Implementation is Proven +100M Gates +1M Gates • Way back when in the last century • Designs grew beyond the reach of flat implementation • Established hierarchical methodologies, tried, and true • The success will continue because • naturally an iterative and gradual refinement process • relatively larger error margins and tolerances for tradeoff • more about reuse and integration, less about from scratch • …
But, “Classic” Hierarchical Timing is Inadequate for Signoff TOP chip netlist chip parasitics Inst Full chip golden constraints Flat STA (golden) Hier STA ILM, ETM, glass-box, black-box, … block netlist parasitics Block Block constraints (ad-hoc) Gap #1 - Burden is on the users: “Garbage in, garbage out” • Block designers do not have quality constraints Can’t close block timing with confidence: pessimism, optimism Can’t create quality models: pessimism, optimism Gap #2 - Language limitations: critical details can’t be elaborated • Chip level designers do not have means to express design intention Can’t describe I/O timing context accurately and completely Can’t cover different reuse scenarios The rescue: flat signoff. However, hierarchical signoff is the only way to stay on top of the technology curve.
And Here is How to do Hierarchical Signoff • The Recipe on Top of Signoff Quality Engine • Provide hierarchical constraint management • Check and highlight inconsistencies • Provide context feedback and allow refinement • Produce accurate and elaborate timing environment • Provide Ease-of-Use through data / flow automation • Minimize/prevent user errors by construction • The Benefits Go Beyond Signoff • Design faster: throughput and interoperation with implementation • Design better: accuracy enables further optimization for power, leakage, robustness, area, etc.
Charge 2: EDA tools/flows are inad-equate for a construction flow: budgeting, IP models, hierarchical constraint development are lacking Plaintiff: Amit Shaligram, STMicroelectronics Defendant: Alex Rubin, IBM
Hierarchical Constraints & Budgeting Amit Shaligram, Principal Engineer STMicroelectronics
Models – Accuracy, speed and compatibility • Which model to use? • ETM or .lib – Reasonable for use before clock tree. • ILM – Required after clock tree insertion • Model accuracy • Different modes at block and top level, block/top constraint mismatches • Handling of high fanout and static nets • Model compatibility • Models between different vendors/tools are not compatible. • Some tools create “physical ILMs” others only “timing ILMs” • It takes time.. • For a ~2M instance block: 1 scenario (1 mode/1 corner), it takes ~6-8 hours • Quickly becomes impractical with 25 blocks, ~5 modes and ~16 corners • Can someone create models on the fly? Just use the DEF! 24 Presentation Title
Budgeting • Floorplan and constraints – a chicken and egg problem! • Estimation of feedthru delays can be challenging. • Consider crosstalk effect! • Best practices not easy to follow all the time (FF at the boundary) • Critical path from a macro, legacy design, cannot tolerate extra latency • Managing hold violations with FF at the boundary • Uncommon clock path creates hold violations due to OCV impact. • SDC format limitations after clock tree insertion • Input/Output delay is specified with respect to virtual clock • Latency of virtual clock changes with every step of the flow (postCTS, postRoute, postRouteSI) 25 Presentation Title
Hierarchical Constraints • Top down or bottom-up constraints development flow ? • How to ensure that block and top constraints are aligned? • Constraint modifications required when using .lib or ILMs in top level • Generated clock definitions inside blocks create “new internal” clocks/pins • Handling large constraint files created within ILM generation flow(s) • Boundary conditions for hold? • How to estimate set_min_delay accurately? • Crosstalk effects of top level clock tree • How much margin is too much margin inside the blocks? • Using infinite timing windows inside the blocks is an overkill 26 Presentation Title
Charge 2: EDA tools/flows are inad-equate for a construction flow: budgeting, IP models, hierarchical constraint development are lacking Plaintiff: Amit Shaligram, STMicroelectronics Defendant: Alex Rubin, IBM
Living in a flat world? March 27, 2013
Long list of charges that simply don’t stick… Many teams have used hierarchy successfully to tape out designs! • Large problems require the use of “divide and conquer”. Vast amount of design experience, understanding and overcoming practical challenges. Tools help establish hand-shake across hierarchical levels. • Verification of boundary conditions and assumptions. • Automatic constraint generation and management. • Enforcement of best design practices. Significant body of “do’s and don’ts” to help provide guidance, improve efficiency and reduce pessimism.
Flop 1 Q D CLK Follow best hierarchical design practices Isolate output loading from internal paths! Flop bound the design! Simple rules can make hierarchy easy(er)! Macro A Macro B Flop 2 Q D CLK Use single macro clock input! Avoid critical paths crossing boundaries!
44M Objects! Object count per unit 5X Speedup 10+ days Run time (hours) Deterministic Timing Statistical Timing Hierarchy is a “must have”! Parallelizes timing and optimization of independent paths to improve over-all efficiency. Better supports timing closure when different macros / top level are at different “stages” of completeness. Fosters un-interrupted design fix-up loop. More resilient to failure.
Charge 3: You can never really close out-of-context +Misdemeanor charge: too much additional complexity and software Plaintiff: Guntram Wolski, Cisco Defendant: Alexander Skourikhin, Intel
Hierarchical TimingFelonies or Misdemeanors? Guntram Wolski – Cisco Systems Principal Engineer Enterprise Networking Group
Felony ChargeYou can never really close out of context You can come close, but that only counts in ….. Or if you start worst casing things, you’ve overdesigned… You can set goals/targets for blocks, but then reality sets in. You end up opening block as it is the “right thing to do” in order to close. Multiple instances of same core How do you wire over/through the cores? Wiring bays – what if you don’t have enough in some areas? Wire over the top == create new extraction/unique timing problems. Noise issues Every instance doesn’t have same IR drop/noise profile
Misdemeanor Charge Requires strict PD requirements to be effective Very strict methodology to be effective Need flopped boundaries Long distance routes/fly overs need extra handling or pushed down Legacy designs/IP integration cause immediate loss of benefit Integration/Adopt complexity seems more so than with other tools Logic designers have very little interest in helping PD It’s good enough, live with it. I’m not paid to improve your problems, I just meet timing. I have to work on something else, you have to fix it. Are we leaving performance on table? Subchips need to be designed to guardbanded conditions on I/Os and IR drop
Are we solving the problem the wrong way? Why are we not looking at taking advantage of parallelism? Are these not many individual paths? If DRC can run on 120 cpus and benefit, why can’t timing? Break up the problem and distribute to my farm….
Charge 3: You can never really close out-of-context +Misdemeanor charge: too much additional complexity and software Plaintiff: Guntram Wolski, Cisco Defendant: Alexander Skourikhin, Intel
Defense • Timing closure is an iterative process • Controllability is the key for success • Start from initial spec • Once design is getting mature, gradually refine environmental requirements and increase model accuracy • Finally, you see the “real” timing requirements, avoiding overdesign • Non-overdesigned multi-instantiated blocks are reality • Must see all the requirements (timing, parasitics) w/o worst casing • Clocks handling is the real challenge • Noise is never an issue (at most – make worst case between instances) • Reusable IPs are feasible • Have to use accurate block models (adjustable to a new env.) • Have to apply design restrictions on interfaces
Defense (cont.) • Have to apply methodological restrictions to block interfaces • Driver size, wire length, ports, etc. • All of them are manageable and ease integration on top level • Doesn’t necessarily lead to overdesign, due to accurate block models • Applicable to both flop and latch based designs • Timing analysis is highly parallelizable • Individual block analysis is naturally done in parallel • Top level analysis might • leverage multi-threading technologies in STA algorithms • be divided in clusters and every cluster is analyzed in parallel
Summary • Efficient and Reliable Hierarchical Flow requires two essential factors: • A robust project methodology, which • Enforces design restrictions • Takes advantage of IP Reuse • Provides continuous timing picture throughout all project phases • Allows productive ECO work • Advanced EDA tools, which • Are flexible and allow controllability between accuracy and simplicity • Can efficiently handle Multi-X environments (X=system, corner, clocks, etc.) • Utilize parallel computing techniques • Support batch and ECO modes
Charge 4: Hierarchical timing cannot handle multiple interacting synchro-nous clocks Plaintiff: Larry Brown, IBM Defendant: Igor Keller, Cadence
Hierarchical timing cannot handle multiple interacting synchronous clocks • Define the problem:
Definition continued • If clk1X is later than clk2X, we reduce our setup margin. • If clk1X is earlier than clk2X, we reduce our hold margin. • We don’t know the real relationship between the two clocks until we have our top level established. • This makes it difficult to close timing on the logic macro and “put it on the shelf.” • The problem is magnified if the logic macro is re-used. • In that case, the setup and hold margins of the logic macro must span all existing clk1X-clk2X relationships.
Fixes from timing methodology • Option 1: Assert an uncertainty between clk1X and clk2X in macro timing, and validate this uncertainty when running top level timing. • Problem with this: • Leave performance/area on the table by lowering cycle time and/or over-padding hold fails. • If top level can’t meet this requirement, we must open up logic macro for further work. • Option 2: ???
The best solution: Fix the design Update the design so we do not have multiple synchronous clock inputs in the first place.
Conclusion Perhaps it’s more accurate to say that hierarchical timing can handle multiple synchronous clock inputs, but cannot do this without leaving performance and/or area on the table. In other words, it does not lead to the most efficient design.
Charge 4: Hierarchical timing cannot handle multiple interacting synchro-nous clocks Plaintiff: Larry Brown, IBM Defendant: Igor Keller, Cadence
Defense: • First and foremost, defendant pleads not guilty • The charge from plaintiff only means that there is no free lunch • For Hierarchical Timing to work designers must follow certain rules • They are well described in Alex Rubin defense • Specifically, one should have a single clock pin in a block to avoid extra pessimism in hold/setup timing • In the case of multiple clock pins plaintiff himself exonerated defender by proposing a solution: • it is possible to remove some of the pessimism by describing relationship between two clocks 48
Defense (cont.) • Advanced SI analysis today reduces pessimism today if victim and aggressor share same clock • SI analysis also becomes more problematic with multiple clock pins • With multiple clock pins one assumes the clocks are different leading to • Pessimism if uncertainty is assigned to both pins • Optimism if no uncertainty is assigned • As often is true, the best way to resolve a problem is to avoid creating it: stick to rules of hierarchy-friendly design methodology
Ways to Remove the Limitation CLK • There are ways to define relationship between two internal clocks: • Through parent external clock • Explicitly define ranges of skews • Parameterization of timing models with skew on two clocks is possible • These enhancement are feasible but need to be driven by real commercial interest