450 likes | 533 Views
Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Design Methodologies. The Design Productivity Challenge. A growing gap between design complexity and design productivity. Source: sematech97. A Simple Processor. MEMORY. INPUT/OUTPUT.
E N D
Digital Integrated CircuitsA Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic DesignMethodologies
The Design Productivity Challenge A growing gap between design complexity and design productivity Source: sematech97
A Simple Processor MEMORY INPUT/OUTPUT CONTROL INPUT-OUTPUT DATAPATH The structure may be repeated many times on a single die (SOC)
Datapath: • All computations are performed here • Consists of an interconnection of basic combinational functions, such as logic (AND, OR etc) or arithmetic operators (addition, multiplication etc) Control Module: • Determines what actions happen in the processor at any given time. It can be viewed as a state machine (FSM)
A System-on-a-Chip: Example Courtesy: Philips Combines all the functions needed for the realization of a complete high-definition digital TV set (It combines two processors here)
Impact of Implementation Choices -- A flexible design may allow ‘late binding’ , in which the application can still be changed after the chip has gone to fabrication. -- COST is performance or energy efficiency. -- Providing programmability means adding overhead to implementation. 100-1000 Domain-specific processor (e.g. DSP) 10-100 Embedded microprocessor Energy Efficiency (in MOPS/mW) 1-10 Hardwired custom Configurable/Parameterizable 0.1-1 Somewhat flexible Flexibility(or application scope) Fully flexible None
Design Methodology • Design process traverses iteratively between three abstractions: behavior, structure, and geometry • More and more automation for each of these steps
Digital Circuit Implementation Approaches Custom Semicustom Cell-based Array-based Standard Cells Pre-diffused Pre-wired Ma cro Cells Compiled Cells (Gate Arrays) (FPGA's) Implementation Choices custom: handrcrafted: when performance or design density are important: high cost: long time to market. Only the most performance-critical modules such as the PLLs and clock buffers are designed manually. Library cell design is the only area where custom design still thrives today
The Custom Approach Intel 4004 (handcrafted) Courtesy Intel
Intel 4004 (‘71) Intel 8080 Intel 8085 Intel 8486 Intel 8286 Transition to Automation and Regular Structures Courtesy Intel
Cell-based Design (or standard cells) --Reduce the implementation effort by reusing a limited library of cells. --The constrained nature of library reduces the possibility of fine-tuning the design. --Layout is automatically generated using the cells in the library --In the standard-cell philosophy, cells are placed in rows that are separated by routing channels. --Cell heights shall be identical but the cell width may vary for complex cells --Routing channel requirements are reduced by presence of more interconnect layers --interconnect wire length can be reduced by feedthrough cells
Standard Cell — Example Standard cell design in a three-layer metal technology. Wiring channels represent a substantial amount of the chip area. [Brodersen92]
Standard Cell – The New Generation --using 7 metal layers --more than 90% density which means that virtually all of the chip area is covered by logic cells --Cell-structure hidden under interconnect layers
Standard Cell - Example all information is provided with the cell 3-input NAND cell (from ST Microelectronics): C = Load capacitance T = input rise/fall time
Automatic Cell Generation Process Initial transistor geometries Placedtransistors Routedcell Compactedcell Finished cell Cost of implementing and characterizing a library of cells Courtesy Acadabra
Product terms x x 0 1 x 2 AND OR plane plane f f 0 1 x x x 0 1 2 A Historical Perspective: the PLA PLA is a regular structured design approach. This methodology enabled the automatic layout generation of two-level logic circuits, in a predictable fashion in terms of area and performance. The emergence of automatic logic synthesis tools for two-level logic made it possible to translate any possible Boolean expression into an optimized two-level (SOP/POS) logic.
Two-Level Logic Every logic function can beexpressed in sum-of-productsformat (AND-OR) minterm Inverting format (NOR-NOR) more effective large fan-in NAND-NAND confign. has slow speed but lower power dissipation
Or-Plane And-Plane V f GND DD PLA Layout – Exploiting Regularity NOR-NOR equation implementation
Cells that contain a complexity that surpasses what is found in typical standard-cell library are called macrocells (hard/soft) MacroModules 25632 (or 8192 bit) SRAM Generated by hard-macro module generator Hard Macro: represents a module with a given functionality and predetermined physical design. The relative location of transistors and wiring within the module is fixed. Advantages: dense layout, optimized and predictable performance and power dissipation, reusability helps to offset the initial design cost. Disadvantages: Hard to port the design to other technologies,
“Soft” MacroModules Soft Macro: represents a module with a given functionality, but without a specific physical design. The relative location of transistors and wiring within the module may vary from instance to instance. This means that the timing data can only be determined after the final synthesis and placement and routing steps– in other words, the process is unpredictable. Advantages: well defined timing guarantees, can be ported over a wide range of technologies and processes, minimum design effort and cost over wide set of designs, soft macro generator has all system information and constraints so that it can produce clever structures that are more efficient than what logic synthesis would produce. Disadvantages: Hard to port the design to other technologies, Synopsys DesignCompiler
Two instances of an 8 X 8 multiplier module with different aspect ratios are shown. The contribution of macrocell generator is to translate the compact input description into an optimized connection of standard cells that meets the timing constraints. The “soft” approach has the advantage that modules with different aspect ratios can be easily generated.
Macromodules are aquired from third-party vendors, through license agreements or paying royality. These are called IPs. “Intellectual Property” A Protocol Processor for Wireless
Design Capture Behavioral HDL Pre-Layout Simulation Structural Logic Synthesis Floorplanning Post-Layout Simulation Placement Physical Circuit Extraction Routing Tape-out Semicustom Design Flow Design Iteration
The “Design Closure” Problem Each of these iterations may take several days—just routing a complex chip may take a week on the most advanced computers! The number of needed iterations grow with scaling of technology. This problem called timing closure, made it obvious that new solution and change in design methodology were required Iterative Removal of Timing Violations (white lines) Courtesy Synopsys
Integrating Synthesis with Physical Design RTL (Timing) Constraints Physical Synthesis Macromodules Fixed netlists Netlist with Place-and-Route Info Place-and-RouteOptimization Artwork
Array-based Pre-diffused Pre-wired (Gate Arrays) or Sea of Gates (FPGA's) Late-Binding Implementation Design automation does not address the time spent in manufacturing process. All the design methodologies require a complete run through the fabrication process. This can take from three weeks to several months, and it can substantially delay the introduction of product. A number of alternative approaches have been devised that do not require a complete run through the manufacturing process. This comes at the expense of lower performance , lower integration density, or higher power dissipation.
Gate Array — Sea-of-gates Channeled Non-channeled Uncommited Cell Committed Cell(4-input NOR)
Sea-of-gate Primitive Cells Using oxide-isolation between gates Using gate-isolation Multiple cells are needed when implementing a flip-flop Dog-bone terminations of poly provides denser routing. In Long rows of transistors sharing the same diffusion area, it is necessary to turn off some devices to provide isolation between the neighboring gates by tying NMOS to GND and PMOS to VDD. This technique is called ‘gate-isolation’, wastes few transistors to provide isolation but it provides an overall high transistor density.
Example: Base Cell of Gate-Isolated GA The cell is one routing track wide, and contains one nMOS and one pMOS. Also shown is a base cell containing all possible contact positions. There is room for 21 contacts in the vertical direction, which means that the cell has a height of 21 tracks. From Smith97
Designs with large number of gates have large memory requirements. Some area is set aside for dedicated memory modules (more efficient design). Mixing of gate arrays with fixed macros is called embedded gate-array approach. Other modules such as microprocessors and microcontrollers are also ideal candidates for embedding. Sea-of-gates Random Logic Memory Subsystem LSI Logic LEA300K (0.6 mm CMOS) Courtesy LSI Logic
Prewired Arrays Classification of prewired arrays (or field-programmable devices): • Based on Programming Technique • Fuse-based (program-once) • Non-volatile EPROM based • RAM based • Programmable Logic Style • Array-Based • Look-up Table • Programmable Interconnect Style • Channel-routing • Mesh networks Can be programmed in the field (i.e., outside the semiconductor foundry) avoiding dedicated manufacturing steps
Fuse-Based FPGA antifuse polysilicon ONO dielectric n antifuse diffusion + 2 l Open by default, closed by applying program current pulse, dielectric melts and permanent connection is formed From Smith97
I I I I I I 5 4 3 2 1 0 Programmable I I I I 3 2 1 0 I I I I I I OR array 5 4 3 2 1 0 Fixed AND array O O O O O 3 2 1 0 O 0 0 Indicates programmable connection Indicates fixed connection Array-Based Programmable Logic Programmable OR array Fixed OR array Programmable AND array Programmable AND array O O O O O O 3 2 1 3 2 1 PLA PROM PAL
1 X X X 2 1 0 : programmed node NA NA f f 1 0 Programming a PROM
Look-up Table Based Logic Cell a two bit large memory called lookup table is programmed to capture the truth table of the function. EXOR for example in above figure. The i/p variables serve as control inputs to a multiplexer which picks the appropriate value from the memory.
LUT-Based Logic Cell 4 C ....C 1 4 xx xxxx xxxx xxxx Bits D xxxx 4 control Logic Din xx D s F’ function D Q 3 G’ of D H’ 2 G1-G4 D 1 Logic EC R G’ function x H’ of x x H1-H4 F 4 Bits xxxx Logic control Din xx F F’ S 3 function D Q G’ F of H’ 2 F1-F4 F 1 EC R x xxxxx x H x P Multiplexer Controlled Xilinx 4000 Series by Configuration Program Courtesy Xilinx
Array-Based Programmable Wiring Interconnect Point Programmed interconnection Input/output pin Cell Horizontal tracks Vertical tracks
Mesh-based Interconnect Network Switch Box Connect Box InterconnectPoint Courtesy Dehon and Wawrzyniek
Transistor Implementation of Mesh Courtesy Dehon and Wawrzyniek
Hierarchical Mesh Network Use overlayed mesh to support longer connections Reduced fanout and reduced resistance Courtesy Dehon and Wawrzyniek
t PIA LAB1 LAB2 PIA t PIA LAB6 Altera MAX Interconnect Architecture column channel row channel LAB Array-based (MAX 3000-7000) Mesh-based (MAX 9000) Courtesy Altera