360 likes | 479 Views
Mozammel Hossain Colorado State University Department of Electrical and Computer Engineering Nest Circuit Lead, IBM, Austin, TX Advisor: Prof. Tom W. Chen Committee Members: Prof. Yashwant Malaiya Dr. Sudeep Pasricha Dr. Ali Pezeshki. Ph.D. Preliminary Exam.
E N D
Mozammel Hossain Colorado State University Department of Electrical and Computer Engineering Nest Circuit Lead, IBM, Austin, TX Advisor: Prof. Tom W. Chen Committee Members: Prof. Yashwant Malaiya Dr. Sudeep Pasricha Dr. Ali Pezeshki Ph.D. Preliminary Exam Ph.D. Preliminary Exam
Research Area • Synthesis Based Design and Implementation Methodology of • High Speed, High Performing Unit (LBS) • Sync-Async Interface timing • Arrays with clock gating • To convert to synthesizable macro Ph.D. Preliminary Exam
Outlines • Introduction • Overview of Present Synthesis Methodology • Future: Research and innovation in Synthesis Methodology • Problem definitions • Large Block Synthesis (LBS): L2 Cache Unit • Sync-Async Interface timing • Clock Gating support for Array Design • Approaches • Preliminary results • Conclusion and Future Work • Acknowledgement Ph.D. Preliminary Exam
Introduction • Technology market demand faster turn around of IC design and designers struggle to meet performance requirements. • Increasing costs for design, validation, and time to market. • past generations of microprocessors had more custom circuit design to meet tighter cycle time battle. • moving towards common synthesizable design methodology and most cases sacrificing desired speed of the chip in favor of new functionality and time to market. Ph.D. Preliminary Exam
Introduction: Design Methodology Ph.D. Preliminary Exam
Introduction: Macro Design Spectrum 5) Custom design(conventional) 4) Custom prerouting 3) Embed custom components Design Customization 2) Preplace lcb/latches 1) VHDL structuring,parm customization ATTRIBUTE BLOCK_DATA of add64 : label is "LOGIC_STYLE=/xxxx/"; 0) “Vanilla” synthesis Design Effort Ph.D. Preliminary Exam
Introduction:Trend of Design methodology for last 16 years Ph.D. Preliminary Exam
Overview of Present Synthesis Methodology • Synthesis • VHDL – compile vhdl • PDSRTL – front-end synthesis • PDSEMPAD – early mode padding • MAR – routing • RAPIDS – post routing optimization • PROMOTE – promote routed design • Run all backend tools (PDV, extraction, timing) Ph.D. Preliminary Exam
Overview of Present Synthesis Methodology MAR/Rapids Cadence Space RLMB PDV Backend tools Ph.D. Preliminary Exam
Overview of Present Synthesis Methodology Slack sharing Example: Broken path Has margin to share • Look at timing across multiple latches • Consider sharing positive slack Ph.D. Preliminary Exam
Overview of Present Synthesis Methodology Slack sharing Example: Balanced Slack Balanced Slack • Delayed 1st Clock by 17 ps • Balanced slack of +3ps across 2 latches Ph.D. Preliminary Exam
Overview of Present Synthesis Methodology • Works very well on • Traditional control macro with 2.5-5M Transistors or about 20K-40K Latches • Timing non-critical macro • Non-embeded IP macro • Without parent’s blockages • Unit buffer, latches, clock blockages • Slack sharing at synchronous clock domain • Without clock gating after Local Clock Buffer (LCB) Ph.D. Preliminary Exam
Future:Research and innovation in Synthesis Methodology 1. Problem definition: Large Block Synthesis (LBS) • Current methodology does not work well for much bigger design: L2 Cache Unit (20M Transistor) • Need techniques such as IP pre-placement, dataflow structuring, and hierarchical embedded synthesis. • Need techniques for Wire Trait, soft hierarchy, Interior PIN • Congestion analysis at Critical timing and wiring area. • Develop Synthesis Methodology to support • Significant Shorter Design Cycle • Significant Physical Design Resources Reduction • Potential Area Reduction Ph.D. Preliminary Exam
Future:Research and innovation in Synthesis Methodology • LBS test case to develop methodology: • Why L2 Cache Unit? • Area challenged unit • Has both 1:1 and 2:1 clocking methodology • 1:1 Clocking is same clock speed as Core clock • Paths on1:1 clocking, are highly timing challenged • Require Dual voltage routing and clock gating • Combination of data flow and control macros • Big unit to challenge tool flow run time and data management Ph.D. Preliminary Exam
Why L2 unit as test case? C0 C1 C2 C3 C4 C5 Core C6 C7 C8 C9 C10 C11 L2 Unit L3 Unit Ph.D. Preliminary Exam
LBS: Why L2 unit as test case? Future:Research and innovation in Synthesis Methodology • Total Cache size: 512KByte • >4 GHz with core interface, control and Data Flow interface • >2 GHz with cache, dir, address, L3 and Fabric interface • Unit Size: > 4.0 sq mm in 22nm, Total Black Box: 82 • #of Transistor including cache: 44M # of Synthesizable Transistor: 19M Ph.D. Preliminary Exam
Future:Research and innovation in Synthesis Methodology LBS: Physical Design Resource Comparison with Proposed Methodology Ph.D. Preliminary Exam
Future:Research and innovation in Synthesis Methodology 2. Problem definition: Synthesis timing methodology for Sync-Async interface. • Slack Sharing can not be done at Sync-Async Interface. Can result in meta-stable condition . Need to develop a methodology. • To handle Slack sharing in synthesis and timing environment • Identify latches involved. Turn-ff slack sharing. • For Design Automation. Ph.D. Preliminary Exam
Future:Research and innovation in Synthesis Methodology Slack Sharing can not be done at Sync-Async Interface Ph.D. Preliminary Exam
Future:Research and innovation in Synthesis Methodology Slack-Sharing at Sync-Async Interface can result in Meta-stability condition Meta-stability At Latch point Ph.D. Preliminary Exam
Future:Research and innovation in Synthesis Methodology 3. Problem definition: Clock Gating support for Array Design in Synthesis Methodology. • Compliable Array offers fixed menu with limited read write ports. • Does not support clock gating. • Current methodology does not allow any gates between LCB (Local Clock Buffer) and Latch to prevent electrical rule violation. • Wiring, gate placement & timing constraints need to be developed. • Minimum custom design: Only Array Column • Potential Benefits: • Around 20% Physical Design Resources Reduction. • Significant Shorter Design Cycle • Apply learning to other array design for more savings. • Potential area saving in Synthesis flow. Ph.D. Preliminary Exam
Proposed Array Design in Synthesis Methodology Future:Research and innovation in Synthesis Methodology • LCB: Local Clock Buffer • Generate CLK for MS Latch Ph.D. Preliminary Exam
Approaches: • Pre-Placing Hard IP in LBS • Pseudo Algorithm begin_place place <inst_name> xloc <> yloc <> <rot> movetype=fixed end_place • Wire Trait Example in LBS • Pseudo Parms file <Flow>: <wire_code> <time gain> <routing layers> synthesis_layer_traits : W20S10L15 3 3 M2 X3 fine_opt_layer_traits : W20S10L15 3 3 M2 X3 Ph.D. Preliminary Exam
Approaches: Soft-Hierarchy in LBS Algorithm: inst_name=rlctl prefix=l2rlctl xlow=< > ylow=< > width= height= where <inst_name>: user specified name to recognize gates prefix: is the name of logic gates used in VHDL xlow, ylow= left lower coordinate width, height: width and height of macro in micron Ph.D. Preliminary Exam
Approaches: Synthesis Parms in LBS • VT Upgrade *user_native_vt: 1 *user_alternate_vt: 2 3 • Interior PIN *pds_assign_interior_pins: true *pds_pin_spec: “<metal layer> <width> <height>“ *pds_horizontal_pin_spacing: “<metal layer> <Spacing>" *pds_vertical_pin_spacing: “<metal layer> <Spacing>” • Rapids Ph.D. Preliminary Exam
Approaches: Congestion Analysis • Routing resource allocation at top level • Negotiate routing resources with macro (IP) • Negotiate PIN placement with macro (IP) Ph.D. Preliminary Exam
Latch Sync-Async Latch Latch data_in Logic Logic data_out NCLK NASYNC NASYNC Approaches: Synthesis Methodology at Sync-Async Interface Application of Sync-Async Latch: Ph.D. Preliminary Exam
Approaches: Synthesis Methodology at Sync-Async Interface Pseudo Algorithm to exclude Sync-Async Latch in slack borrowing: Ph.D. Preliminary Exam
Preliminary Results: Placed and Timed Gates of L2 #of Transistor including cache: 44M # of Synthesizable Transistor: 19M Ph.D. Preliminary Exam
Preliminary Results: Slack Take Down of L2 Ph.D. Preliminary Exam
Preliminary Results: Clock Gating at Array interface • Clock gating is not working • Red shape/line: Current Routing and Placement • Violates timing at array cell, Electrical check • Blue shape/line: Desired Routing and Placement L LCB L LCB LCB Ph.D. Preliminary Exam
Preliminary Results: Clock Gating at Array interface • Clock gating is not working Ph.D. Preliminary Exam
Conclusion and Future Works • With robust tool sets, newly proposed synthesis methodology and design guideline, L2 cache unit design can take almost ~50% less resources to design even without dedicated unit timing and integration resources. • Preliminary data is very promising. • Further Experiment with 10% less unit area once design is closed. • Timing at Sync-Async interface methodology in Synthesis flow is being developed with user controlled parms. • Clock-Gating work in progress with collaboration from the Tool development team of IBM. • Save 20% of design effort at present application in RF design • Potential lead to more physical design effort savings in all type of array design. i.e SRAM, CAM, DRAM Ph.D. Preliminary Exam
Acknowledgement Advisor: Prof. Tom W. Chen Committee Members: Prof. Yashwant Malaiya Dr. Sudeep Pasricha Dr. Ali Pezeshki IBM: Joshua Friedrich Dr. Vikas Agarwal Chirag Desai John Badar Ph.D. Preliminary Exam
Back-ups Ph.D. Preliminary Exam
Personal Background • Educational • BS in Electrical Engineering, BUET, Dhaka, Bangladesh • ME in Electrical Engineering, CUNY, New York • Professional • Product Development Engineer, Advanced Micro Devices (AMD), TX:1994 – 1997 • Circuit Design, Critical timing path analysis, Layout for K5 development team • Hardware Development Engineer, Mentor Graphics Corporation, NJ: 1997 -1999 • Test chip, Data Path Design, verilog model for ROM/RAM, • Member of Technical Staff, Hewlett Packard (HP), CO: 1999 – 2002 • Circuit design for FPU, High Speed IO Driver, Place and route, Timing analysis • Senior Engineer, International Business Machines (IBM), TX: 2003 – Present • Fabric Unit interim/co-Circuit Lead P6 • GX, TP, CLIB, PC Unit Circuit Lead: P6 DD1 • L2, L3, NCU Circuit Lead: P6 DD2 • L2, NCU Circuit Lead: P7 DD1, DD2 • Nest Circuit Lead for P8, P9 Ph.D. Preliminary Exam