1 / 36

Ph.D. Preliminary Exam

Mozammel Hossain Colorado State University Department of Electrical and Computer Engineering Nest Circuit Lead, IBM, Austin, TX Advisor: Prof. Tom W. Chen Committee Members: Prof. Yashwant Malaiya Dr. Sudeep Pasricha Dr. Ali Pezeshki. Ph.D. Preliminary Exam.

kendall
Download Presentation

Ph.D. Preliminary Exam

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mozammel Hossain Colorado State University Department of Electrical and Computer Engineering Nest Circuit Lead, IBM, Austin, TX Advisor: Prof. Tom W. Chen Committee Members: Prof. Yashwant Malaiya Dr. Sudeep Pasricha Dr. Ali Pezeshki Ph.D. Preliminary Exam Ph.D. Preliminary Exam

  2. Research Area • Synthesis Based Design and Implementation Methodology of • High Speed, High Performing Unit (LBS) • Sync-Async Interface timing • Arrays with clock gating • To convert to synthesizable macro Ph.D. Preliminary Exam

  3. Outlines • Introduction • Overview of Present Synthesis Methodology • Future: Research and innovation in Synthesis Methodology • Problem definitions • Large Block Synthesis (LBS): L2 Cache Unit • Sync-Async Interface timing • Clock Gating support for Array Design • Approaches • Preliminary results • Conclusion and Future Work • Acknowledgement Ph.D. Preliminary Exam

  4. Introduction • Technology market demand faster turn around of IC design and designers struggle to meet performance requirements. • Increasing costs for design, validation, and time to market. • past generations of microprocessors had more custom circuit design to meet tighter cycle time battle. • moving towards common synthesizable design methodology and most cases sacrificing desired speed of the chip in favor of new functionality and time to market. Ph.D. Preliminary Exam

  5. Introduction: Design Methodology Ph.D. Preliminary Exam

  6. Introduction: Macro Design Spectrum 5) Custom design(conventional) 4) Custom prerouting 3) Embed custom components Design Customization 2) Preplace lcb/latches 1) VHDL structuring,parm customization ATTRIBUTE BLOCK_DATA of add64 : label is "LOGIC_STYLE=/xxxx/"; 0) “Vanilla” synthesis Design Effort Ph.D. Preliminary Exam

  7. Introduction:Trend of Design methodology for last 16 years Ph.D. Preliminary Exam

  8. Overview of Present Synthesis Methodology • Synthesis • VHDL – compile vhdl • PDSRTL – front-end synthesis • PDSEMPAD – early mode padding • MAR – routing • RAPIDS – post routing optimization • PROMOTE – promote routed design • Run all backend tools (PDV, extraction, timing) Ph.D. Preliminary Exam

  9. Overview of Present Synthesis Methodology MAR/Rapids Cadence Space RLMB PDV Backend tools Ph.D. Preliminary Exam

  10. Overview of Present Synthesis Methodology Slack sharing Example: Broken path Has margin to share • Look at timing across multiple latches • Consider sharing positive slack Ph.D. Preliminary Exam

  11. Overview of Present Synthesis Methodology Slack sharing Example: Balanced Slack Balanced Slack • Delayed 1st Clock by 17 ps • Balanced slack of +3ps across 2 latches Ph.D. Preliminary Exam

  12. Overview of Present Synthesis Methodology • Works very well on • Traditional control macro with 2.5-5M Transistors or about 20K-40K Latches • Timing non-critical macro • Non-embeded IP macro • Without parent’s blockages • Unit buffer, latches, clock blockages • Slack sharing at synchronous clock domain • Without clock gating after Local Clock Buffer (LCB) Ph.D. Preliminary Exam

  13. Future:Research and innovation in Synthesis Methodology 1. Problem definition: Large Block Synthesis (LBS) • Current methodology does not work well for much bigger design: L2 Cache Unit (20M Transistor) • Need techniques such as IP pre-placement, dataflow structuring, and hierarchical embedded synthesis. • Need techniques for Wire Trait, soft hierarchy, Interior PIN • Congestion analysis at Critical timing and wiring area. • Develop Synthesis Methodology to support • Significant Shorter Design Cycle • Significant Physical Design Resources Reduction • Potential Area Reduction Ph.D. Preliminary Exam

  14. Future:Research and innovation in Synthesis Methodology • LBS test case to develop methodology: • Why L2 Cache Unit? • Area challenged unit • Has both 1:1 and 2:1 clocking methodology • 1:1 Clocking is same clock speed as Core clock • Paths on1:1 clocking, are highly timing challenged • Require Dual voltage routing and clock gating • Combination of data flow and control macros • Big unit to challenge tool flow run time and data management Ph.D. Preliminary Exam

  15. Why L2 unit as test case? C0 C1 C2 C3 C4 C5 Core C6 C7 C8 C9 C10 C11 L2 Unit L3 Unit Ph.D. Preliminary Exam

  16. LBS: Why L2 unit as test case? Future:Research and innovation in Synthesis Methodology • Total Cache size: 512KByte • >4 GHz with core interface, control and Data Flow interface • >2 GHz with cache, dir, address, L3 and Fabric interface • Unit Size: > 4.0 sq mm in 22nm, Total Black Box: 82 • #of Transistor including cache: 44M # of Synthesizable Transistor: 19M Ph.D. Preliminary Exam

  17. Future:Research and innovation in Synthesis Methodology LBS: Physical Design Resource Comparison with Proposed Methodology Ph.D. Preliminary Exam

  18. Future:Research and innovation in Synthesis Methodology 2. Problem definition: Synthesis timing methodology for Sync-Async interface. • Slack Sharing can not be done at Sync-Async Interface. Can result in meta-stable condition . Need to develop a methodology. • To handle Slack sharing in synthesis and timing environment • Identify latches involved. Turn-ff slack sharing. • For Design Automation. Ph.D. Preliminary Exam

  19. Future:Research and innovation in Synthesis Methodology Slack Sharing can not be done at Sync-Async Interface Ph.D. Preliminary Exam

  20. Future:Research and innovation in Synthesis Methodology Slack-Sharing at Sync-Async Interface can result in Meta-stability condition Meta-stability At Latch point Ph.D. Preliminary Exam

  21. Future:Research and innovation in Synthesis Methodology 3. Problem definition: Clock Gating support for Array Design in Synthesis Methodology. • Compliable Array offers fixed menu with limited read write ports. • Does not support clock gating. • Current methodology does not allow any gates between LCB (Local Clock Buffer) and Latch to prevent electrical rule violation. • Wiring, gate placement & timing constraints need to be developed. • Minimum custom design: Only Array Column • Potential Benefits: • Around 20% Physical Design Resources Reduction. • Significant Shorter Design Cycle • Apply learning to other array design for more savings. • Potential area saving in Synthesis flow. Ph.D. Preliminary Exam

  22. Proposed Array Design in Synthesis Methodology Future:Research and innovation in Synthesis Methodology • LCB: Local Clock Buffer • Generate CLK for MS Latch Ph.D. Preliminary Exam

  23. Approaches: • Pre-Placing Hard IP in LBS • Pseudo Algorithm begin_place place <inst_name> xloc <> yloc <> <rot> movetype=fixed end_place • Wire Trait Example in LBS • Pseudo Parms file <Flow>: <wire_code> <time gain> <routing layers> synthesis_layer_traits : W20S10L15 3 3 M2 X3 fine_opt_layer_traits : W20S10L15 3 3 M2 X3 Ph.D. Preliminary Exam

  24. Approaches: Soft-Hierarchy in LBS Algorithm: inst_name=rlctl prefix=l2rlctl xlow=< > ylow=< > width= height= where <inst_name>: user specified name to recognize gates prefix: is the name of logic gates used in VHDL xlow, ylow= left lower coordinate width, height: width and height of macro in micron Ph.D. Preliminary Exam

  25. Approaches: Synthesis Parms in LBS • VT Upgrade *user_native_vt: 1 *user_alternate_vt: 2 3 • Interior PIN *pds_assign_interior_pins: true *pds_pin_spec: “<metal layer> <width> <height>“ *pds_horizontal_pin_spacing: “<metal layer> <Spacing>" *pds_vertical_pin_spacing: “<metal layer> <Spacing>” • Rapids Ph.D. Preliminary Exam

  26. Approaches: Congestion Analysis • Routing resource allocation at top level • Negotiate routing resources with macro (IP) • Negotiate PIN placement with macro (IP) Ph.D. Preliminary Exam

  27. Latch Sync-Async Latch Latch data_in Logic Logic data_out NCLK NASYNC NASYNC Approaches: Synthesis Methodology at Sync-Async Interface Application of Sync-Async Latch: Ph.D. Preliminary Exam

  28. Approaches: Synthesis Methodology at Sync-Async Interface Pseudo Algorithm to exclude Sync-Async Latch in slack borrowing: Ph.D. Preliminary Exam

  29. Preliminary Results: Placed and Timed Gates of L2 #of Transistor including cache: 44M # of Synthesizable Transistor: 19M Ph.D. Preliminary Exam

  30. Preliminary Results: Slack Take Down of L2 Ph.D. Preliminary Exam

  31. Preliminary Results: Clock Gating at Array interface • Clock gating is not working • Red shape/line: Current Routing and Placement • Violates timing at array cell, Electrical check • Blue shape/line: Desired Routing and Placement L LCB L LCB LCB Ph.D. Preliminary Exam

  32. Preliminary Results: Clock Gating at Array interface • Clock gating is not working Ph.D. Preliminary Exam

  33. Conclusion and Future Works • With robust tool sets, newly proposed synthesis methodology and design guideline, L2 cache unit design can take almost ~50% less resources to design even without dedicated unit timing and integration resources. • Preliminary data is very promising. • Further Experiment with 10% less unit area once design is closed. • Timing at Sync-Async interface methodology in Synthesis flow is being developed with user controlled parms. • Clock-Gating work in progress with collaboration from the Tool development team of IBM. • Save 20% of design effort at present application in RF design • Potential lead to more physical design effort savings in all type of array design. i.e SRAM, CAM, DRAM Ph.D. Preliminary Exam

  34. Acknowledgement Advisor: Prof. Tom W. Chen Committee Members: Prof. Yashwant Malaiya Dr. Sudeep Pasricha Dr. Ali Pezeshki IBM: Joshua Friedrich Dr. Vikas Agarwal Chirag Desai John Badar Ph.D. Preliminary Exam

  35. Back-ups Ph.D. Preliminary Exam

  36. Personal Background • Educational • BS in Electrical Engineering, BUET, Dhaka, Bangladesh • ME in Electrical Engineering, CUNY, New York • Professional • Product Development Engineer, Advanced Micro Devices (AMD), TX:1994 – 1997 • Circuit Design, Critical timing path analysis, Layout for K5 development team • Hardware Development Engineer, Mentor Graphics Corporation, NJ: 1997 -1999 • Test chip, Data Path Design, verilog model for ROM/RAM, • Member of Technical Staff, Hewlett Packard (HP), CO: 1999 – 2002 • Circuit design for FPU, High Speed IO Driver, Place and route, Timing analysis • Senior Engineer, International Business Machines (IBM), TX: 2003 – Present • Fabric Unit interim/co-Circuit Lead P6 • GX, TP, CLIB, PC Unit Circuit Lead: P6 DD1 • L2, L3, NCU Circuit Lead: P6 DD2 • L2, NCU Circuit Lead: P7 DD1, DD2 • Nest Circuit Lead for P8, P9 Ph.D. Preliminary Exam

More Related