150 likes | 163 Views
Addressing chip-level assembly challenges, achieving optimal performance, power, and area through a novel approach integrating retiming and architectural floorplanning. Goals, challenges, conventional flows, and the proposed new design flow emphasizing retime-based strategies are discussed in detail.
E N D
Integration of Retiming with Architectural Floorplanning:A New Design Methodology for DSM Abdallah and Bassam Tabbara Profs: R.K.Brayton, A.R.Newton, and K.Keutzer The NexSIS Project
Issues in DSM • timing at the module level not an issue • timing at the chip level is an issue • bigger raw capacity that can be used by: • replication • reuse • NTRS Projections: • 1997: • .25u 4M tr/cm2 600 pins 6 layers 3cm2 • 2006: • .1u 40M tr/cm2 1500 pins 8 layers 10cm2
Problem Description • one-level hierarchy of design • minimum number of levels to support reuse • mid way between flat and two level • placement and wireplanning of: • 200-2000 modules, average size: 50k gates • dynamic range of modules sizes: 1-500k gates • types of modules: hard, firm, soft • hard: layout • firm: gates + aspect ratio • soft: RTL • large number of nets: 40k-100k • pins per module: 10-100
Problem Statement We address chip level assembly of predesigned IP blocks, each under 100k gates in size, either as hard or soft macros, optimizing for performance, power and area (emphasis in that order).
Goals • develop a tool that has an impact in DSM by supporting IP reuse • handle IP blocks that have constraints and should be combined to result in a certain functionality. User design constraints include: • delay • power dissipation • area • generate a final layout within 12-24 hours (overnight) • complexity of algorithms within O(m2) -O(m3) m = design complexity • final result should: • be within 5-10% of human design (may not be able to compare) • meet the user constraints if possible or make design suggestions
Challenges • size issues: • bigger block sizes, aspect ratios and relative sizes • number of pins, nets much bigger than blocks • placement issues: • special design for memories? • partitioning hard, clustering easy • routing issues: • no channels, point to point • busses • many metal layers to be assigned • timing at the chip level
Conventional Flows • integration of various steps and tools: • Logic Synthesis - Physical Design • Global - Detailed • separation of concerns: • front end - back end • no contract • separation entails hundreds of iterations: • number of iterations can be proportional to complexity of design
New Design Flow • minimize design iteration: • planning at the early stages of the flow • support incremental changes • need for a proof of convergence • introduce retiming into the architectural floorplanning stage • better handle on timing issues • path-based vs. net-based
Functional Decomposition • provides an entry point for reused IPs • RTL may already be well characterized • area-delay trade-off as an important performance characteristic • result is: • a set of blocks • some area-delay trade-off estimates
Retiming • takes in lower bound constraints • creates upper bound constraints • reduces area of modules whenever possible • can be made refinable and incremental • depends on granularity of the representation • path-based
Placement / Routing • initial placement/routing step • can be a min-cut or any constructive approach • has to be fast • gives lower bounds on delays between modules • placement/routing: • takes in upper bounds from retiming as flexibility on placement • replaces modules resulting in better lower bound constraints • objective is to reduce total chip area • delay is reduced indirectly
Logic Synthesis • assumption: • problems can be solved at the module level • predictable for given size modules • can be run in parallel for the different modules • provides better estimates of area-delay trade-offs for subsequent iterations
Iterations • loop between placement and retiming • until no further improvements are possible • may iterate many times • very similar to: • initial min-cut partitioning • low temperature simulated annealing • have to prove some convergence criteria • loop between floorplanning/wireplanning and layout • only a few iterations • each iteration information is retained through area-delay trade-offs • also proof of convergence