650 likes | 811 Views
EECS 150 - Components and Design Techniques for Digital Systems Lec 27 – Summary (whirlwind ) 12-9-04. David Culler Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~culler http://www-inst.eecs.berkeley.edu/~cs150.
E N D
EECS 150 - Components and Design Techniques for Digital Systems Lec 27– Summary (whirlwind)12-9-04 David Culler Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~culler http://www-inst.eecs.berkeley.edu/~cs150
Deep Digital Design Experience Fundamentals of Boolean Logic Synchronous Circuits Finite State Machines Timing & Clocking Device Technology & Implications Controller Design Arithmetic Units Bus Design Encoding, Framing Testing, Debugging Hardware Architecture HDL, Design Flow (CAD) Pgm Language Asm / Machine Lang CS 61C Instruction Set Arch Machine Organization HDL FlipFlops Gates Circuits Devices EE 40 Transistor Physics Transfer Function Background EECS150 Lec26 - Summary
Example digital representation: acoustic waveform A series of numbers is used to represent the waveform, rather than a voltage or current, as in analog systems. Course Content Components and Design Techniques for Digital Systems Synchronous Digital Hardware Systems • Synchronous: “Clocked” - all changes in the system are controlled by a global clock and happen at the same time (not asynchronous) • Digital: All inputs/outputs and internal values (signals) take on discrete values (not analog). EECS150 Lec26 - Summary
Trick you into building an extreme project • FPGA/SDRAM provides full game logic • Court, obstructions • Moving paddles • Moving, colliding ball • All the physics • Court displayed to NTSC (TV) Video Output • Real time Sound effects ??? • N64 controller (and switches) for input • How to make it multiplayer? • The network EECS150 Lec26 - Summary
Levels of Digital Design EECS150 Lec26 - Summary
What makes Digital Systems tick? Combinational Logic clk time What determines the systems performance? EECS150 Lec26 - Summary
The 150 “stuff” • Building blocks of computer systems • ICs (Chips), PCBs, Chassis, Cables & Connectors • CMOS Transistors • Voltage controlled switches • Complementary forms (nmos, pmos) • Logic gates from CMOS transistors • Logic gates implement particular boolean functions • N inputs, 1 output • Serial and parallel switches • Dual structure • P-type “pull up” transmit 1 • N-type • Complex gates: mux • Synchronous Sequential Elements • D FlipFlops EECS150 Lec26 - Summary
Combinational Logic (CL) Defined yi = fi(x0 , . . . . , xn-1), where x, y are {0,1}. Y is a function of only X. • If we change X, Y will change • immediately (well almost!). • There is an implementation dependent delay from X to Y. EECS150 Lec26 - Summary
Inverter (NOT gate): NAND gate Logic Function: out = 0 iff both a AND b = 1 therefore out = (ab)’ pFET network and nFET network are duals of one another. a b out 0 0 1 0 1 1 1 0 1 1 1 0 Transistor-level Logic Circuits - NAND nand (out, a, b) How about AND gate? EECS150 Lec26 - Summary
Combinational logic summary • Logic functions, truth tables, and switches • NOT, AND, OR, NAND, NOR, XOR, . . ., minimal set • Axioms and theorems of Boolean algebra • Proofs by re-writing and perfect induction • Gate logic • Networks of Boolean functions and their time behavior • Canonical forms • Two-level and incompletely specified functions • Optimization • Two-level simplification using K-maps • Automation of simplification • Multi-level logic • Later • Design case studies • Time behavior EECS150 Lec26 - Summary
Positive Level-sensitive latch Transistor Level Positive Edge-triggered flip-flop built from two level-sensitive latches: Transistor-level Logic Circuits - Latch D FlipFlop clk’ clk’ clk clk EECS150 Lec26 - Summary
slave stage master stage P' Q' R Q' R Q' Q D S Q S Q P CLK D Flip-Flop • Make S and R complements of each other in Master stage • Eliminates 1s catching problem • Input only needs to settle by clock edge • Can't just hold previous value (must have new value ready every clock period) • Value of D just before clock goes low is what is stored in flip-flop • Can make R-S flip-flop by adding logic to make D = S + R' Q 10 gates EECS150 Lec26 - Summary
Timing Methodologies • Rules for interconnecting components and clocks • Guarantee proper operation of system when strictly followed • Approach depends on building blocks used for memory elements • Focus on systems with edge-triggered flip-flops • Found in programmable logic devices • Many custom integrated circuits focus on level-sensitive latches • Basic rules for correct timing: • (1) Correct inputs, with respect to time, are provided to the flip-flops • (2) No flip-flop changes state more than once per clocking event EECS150 Lec26 - Summary
D D Q Q Tsu Th input clock Timing Methodologies (cont’d) • Definition of terms • clock: periodic event, causes state of memory element to change; can be rising or falling edge, or high or low level • setup time: minimum time before the clocking event by which the input must be stable (Tsu) • hold time: minimum time after the clocking event until which the input must remain stable (Th) data clock there is a timing "window" around the clocking event during which the input must remain stable and unchanged in order to be recognized stable changing data clock EECS150 Lec26 - Summary
Often PLAs What’s an FSM? • Next state is function of state and input • Moore Machine: output is a function of the state • Mealy Machine: output is a function of state and input inputA State / output inputB inputA/outputA State inputB/outputB EECS150 Lec26 - Summary
ps ns Formal Design Process for FSMs Logic equations from table: OUT = PS NS = PS xor IN • Circuit Diagram: • XOR gate for ns calculation • DFF to hold present state • no logic needed for output • Review of Design Steps: 1. Circuit functional specification 2. State Transition Diagram 3. Symbolic State Transition Table 4. Encoded State Transition Table 5. Derive Logic Equations 6. Circuit Diagram FFs for state CL for NS and OUT EECS150 Lec26 - Summary
Composing FSMs into larger designs FSM FSM CL CL EECS150 Lec26 - Summary
Sequential Synchronous Elements • Basic registers • Common control, MUXes • Simple, important FSMs • simple internal feedback • Ring counters, Pattern detectors • Binary Counters • Universal Shift Register • Using Counters to build controllers • Simplify control by controlling simpler FSM EECS150 Lec26 - Summary
150 and the changing times • Advancing technology changes the trade-offs and design techniques • 2x transistors per chip every 18 months • ASIC, Programmable Logic, Microprocessor • Programmable logic invests chip real-estate to reduce design time & time to market • FPGA: • programmable interconnect, • configurable logic blocks • LUT + storage • Block RAM • IO Blocks • PLAs • General devices for SoP or PoS logic EECS150 Lec26 - Summary
CLB = 4 logic cells (LC) in two slices LC: 4-input function generator, carry logic, storage ele’t 80 x 120 CLB array on 2000E Virtex-E Configurable Logic Block (CLB) FF or latch 16x1 synchronous RAM EECS150 Lec26 - Summary
Basic Idea: Language constructs describe circuits with two basic forms: Structural descriptions similar to hierarchical netlist. Behavioral descriptions use higher-level constructs (similar to conventional programming). Originally designed to help in abstraction and simulation. Now “logic synthesis” tools exist to automatically convert from behavioral descriptions to gate netlist. Greatly improves designer productivity. However, this may lead you to falsely believe that hardware design can be reduced to writing programs! “Structural” example: Decoder(output x0,x1,x2,x3; inputs a,b) { wire abar, bbar; inv(bbar, b); inv(abar, a); nand(x0, abar, bbar); nand(x1, abar, b ); nand(x2, a, bbar); nand(x3, a, b ); } “Behavioral” example: Decoder(output x0,x1,x2,x3; inputs a,b) { case [a b] 00: [x0 x1 x2 x3] = 0x0; 01: [x0 x1 x2 x3] = 0x2; 10: [x0 x1 x2 x3] = 0x4; 11: [x0 x1 x2 x3] = 0x8; endcase; } HDLs EECS150 Lec26 - Summary
Finite State Machines in Verilog Mealy outputs Moore outputs next state combinational logic inputs combinational logic current state EECS150 Lec26 - Summary
Design Methodology in Detail Postsynthesis Design Validation Design Specification Postsynthesis Timing Verification Design Partition Test Generation and Fault Simulation Design Entry Behavioral Modeling Simulation/Functional Verification Cell Placement/Scan Insertation/Routing Verify Physical and Electrical Rules Design Integration And Verification Synthesize and Map Gate-level Net List Pre-Synthesis Sign-Off Design Sign-Off Synthesize and Map Gate-level Net List EECS150 Lec26 - Summary
0 1 111 111 0 0 0 1 A A 0 0 1 1 0 0 1 1 A A 1 1 1 1 1 1 1 1 A A 2 2 0 1 FF FF 000 000 0 1 A A A A A A 2 2 1 1 0 0 Logic Block set by configuration Nextstate bit in FPGA CLB NAND gate in FPGA CLB bit-stream latch 1 3-LUT FF INPUTS OUTPUT out 0 inputs inputs 3-input "look up table" nextstate = A2 xor A1 out = ~(A1 A2 A3) Configuring CLBs out EECS150 Lec26 - Summary
1 111 0 0 A 0 1 0 1 A 1 1 1 1 A 2 0 FF 000 0 A A A 2 1 0 Configuring Routes 0 111 0 1 A 0 1 0 1 A 1 1 1 1 A 2 1 FF 000 1 A A A 2 1 0 in nextstate = A2 xor A1 out = ~(A1 A2 A3) EECS150 Lec26 - Summary
Timing for Synchronous Circuits • In general, for correct operation: for all paths. • How do we enumerate all paths? • Any circuit input or register output to any register input or circuit output. • “setup time” for circuit outputs depends on what it connects to • “clk-Q time” for circuit inputs depends on from where it comes. T time(clkQ) + time(CL) + time(setup) T clkQ + CL + setup EECS150 Lec26 - Summary
Wr Driver Wr Driver Wr Driver Wr Driver - + - + - + - + - + - + - + - + Sense Amp Sense Amp Sense Amp Sense Amp Typical SRAM Organization: 16-word x 4-bit Din 3 Din 2 Din 1 Din 0 WrEn A0 Word 0 SRAM Cell SRAM Cell SRAM Cell SRAM Cell A1 A2 Address Decoder Word 1 SRAM Cell SRAM Cell SRAM Cell SRAM Cell A3 : : : : Word 15 SRAM Cell SRAM Cell SRAM Cell SRAM Cell Dout 3 Dout 2 Dout 1 Dout 0 EECS150 Lec26 - Summary
Classical DRAM Organization (Square) Row and Column Address together select 1 bit a time bit (data) lines r o w d e c o d e r Each intersection represents a 1-T DRAM Cell RAM Cell Array Square keeps the wires short: Power and speed advantages Less RC, faster precharge anddischarge is faster access time! word (row) select Column Address Column Selector & I/O Circuits row address data EECS150 Lec26 - Summary
Memory Array DRAM with Column buffer R O W D E C O D E R … 11 A0…A10 (2,048 x 2,048) Storage W ord Line Cell Sense Amps Column Latches MUX Pull column into fast buffer storage Access sequence of bit from there EECS150 Lec26 - Summary
Digital Arithmetic • Circuit design for unsigned addition • Full adder per bit slice • Delay limited by Carry Propagation • Ripple is algorithmically slow, but wires are short • Carry select • Simple, resource-intensive • Excellent layout • Carry look-ahead • Excellent asymptotic behavior • Great at the board level, but wire length effects are significant on chip • Digital number systems • How to represent negative numbers • Simple operations • Clean algorithmic properties • 2s complement is most widely used • Circuit for unsigned arithmetic • Subtract by complement and carry in • Overflow when cin xor cout of sign-bit is 1 EECS150 Lec26 - Summary
2s Complement Adder/Subtractor A - B = A + (-B) = A + B + 1 EECS150 Lec26 - Summary
ART ART Lec 5, 6: Logic min. Lec 8, 9: Modeling FSMs Lec 4: HDLs, Labs Lec 7, 8: FSM impl. Lec 2, 3: CMOS, FPGA Lec 5, 6: Logic min. Lec 4: HDL, Labs Lec 2, 3: CMOS, FPGA Digital design - as we’ve seen it System specification (in words) Datapath specification Controller specification FSM generation Comb. logic operations Verilog dataflow STT / STD / Encoding Logic: nextstate/outputs Gates / LUTs Verilog behavior Gates / LUTs / FF EECS150 Lec26 - Summary
Final Example: Ant Brain (Ward, MIT) • Sensors: L and R antennae, 1 if in touching wall • Actuators: F - forward step, TL/TR - turn left/right slightly • Goal: find way out of maze • Strategy: keep the wall on the right EECS150 Lec26 - Summary
ResetS ResetR ClkS ClkR Keyboard Display 7 8 RS DB E KeyboardDecode DisplayControl 8 Send AckS 8 Rcvd AckR CharToSend CharRcvd Sender Receiver TxD RxD Serial Line TX/RX – dealing with I/O EECS150 Lec26 - Summary
9 17 board state The GAME • CP1: N64 interface • CP2: Digital video encoder • CP3: SDRAM controller • CP4: IEEE 802.15.4 (cc2420) interface • Project CP: game engine • Endgame composite video ADV7194 8 ITU 601/656 FPGA Video Encode SDRAM Control Render Engine SDRAM Control Data player-1 input 32 Game Physics player-0 input Joystick Interface N64 controller interface EECS150 Lec26 - Summary
Computer Organization • Computer design as an application of digital logic design procedures • Computer = processing unit + memory system • Processing unit = control + datapath • Control = finite state machine • Inputs = machine instruction, datapath conditions • Outputs = register transfer control signals, ALU operation codes • Instruction interpretation = instruction fetch, decode, execute • Datapath = functional units + registers + interconnect • Functional units = ALU, multipliers, dividers, etc. • Registers = program counter, shifters, storage registers • Interconenct = busses and wires • Instruction Interpreter vs Fixed Function Device EECS150 Lec26 - Summary
Design hierarchy system control data-path coderegisters stateregisters combinationallogic multiplexer comparator register logic switchingnetworks EECS150 Lec26 - Summary
signals Datapath vs Control Datapath Controller • Datapath: Storage, FU, interconnect sufficient to perform the desired functions • Inputs are Control Points • Outputs are signals • Controller: State machine to orchestrate operation on the data path • Based on desired function and signals Control Points EECS150 Lec26 - Summary
Datapath Design • Datapath consists of state (reg, reg file), function units (adders, ALUs), and interconnect (mux, tri-state & bus) • It can perform certain register transfers: source regs through function units and interconnect to dest reg • Set of reg. Transfers occur on each cycle • Each datapath element has control points • Reg (LD), FU (op), MUX (sel), TriState (OE) • Controller asserts the proper control point to cause the data path to carryout the requested register transfers • The RTLs associated with each step in the high level algorithm determine the STD of the contoller • Controller inputs are datapath outputs (conditions) • Controller outputs are datapath inputs (control points) EECS150 Lec26 - Summary
Array Multiplier Generates all n partial products simultaneously. Each row: n-bit adder with AND gates What is the critical path? EECS150 Lec26 - Summary
Sums each partial product, one at a time. In binary, each partial product is shifted versions of A or 0. Control Algorithm: 1. P 0, A multiplicand, B multiplier 2. If LSB of B==1 then add A to P else add 0 3. Shift [P][B] right 1 4. Repeat steps 2 and 3 n-1 times. 5. [P][B] has product. “Shift and Add” Multiplier • Cost n, = n clock cycles. • What is the critical path for determining the min clock period? EECS150 Lec26 - Summary
DIVIDE HARDWARE Version 2 • 32-bit Divisor register, 32-bit ALU, 64-bit Remainder register, 32-bit Quotient register Divisor 32 bits Shift Left Quotient 32 bits add/sub Shift Left Remainder Control Write 64 bits EECS150 Lec26 - Summary
MUX MUX MUX MUX rt rs rd R4 rd rs R4 rt R4 rs rt rd MUX BUS Register Transfers - interconnect • Point-to-point connection • Dedicated wires • Muxes on inputs ofeach register • Common input from multiplexer • Load enablesfor each register • Control signalsfor multiplexer • Common bus with output enables • Output enables and loadenables for each register EECS150 Lec26 - Summary
A standard high-level representation for describing systems. It follows from the fact that all synchronous digital system can be described as a set of state elements connected by combination logic (CL) blocks: RTL comprises a set of register transfers with optional operators as part of the transfer. Example: regA regB regC regA + regB if (start==1) regA regC Personal style: use “;” to separate transfers that occur on separate cycles. Use “,” to separate transfers that occur on the same cycle. Example (2 cycles): regA regB, regB 0; regC regA; Register Transfer Level Descriptions EECS150 Lec26 - Summary
List Processor Example • RTL gives us a framework for making high-level optimizations. • Fixed function unit • Approach extends to instruction interpreters • General design procedure outline: 1. Problem, Constraints, and Component Library Spec. 2. “Algorithm” Selection 3. Micro-architecture Specification 4. Analysis of Cost, Performance, Power 5. Optimizations, Variations 6. Detailed Design EECS150 Lec26 - Summary
D 0 A_SEL Memory 1 0 NEXT_SEL + 0 0 A NEXT LD_NEXT 1 1 0 SUM_SEL 1 SUM + LD_SUM ==0 NEXT_ZERO 3. Architecture #1 Direct implementation of RTL description: Datapath Controller If (START==1) NEXT0, SUM0; repeat { SUMSUM + Memory[NEXT+1]; NEXTMemory[NEXT]; } until (NEXT==0); RSUM, DONE1; EECS150 Lec26 - Summary
Approaching an ISA • Instruction Set Architecture • Defines set of operations, instruction format, hardware supported data types, named storage, addressing modes, sequencing • Meaning of each instruction is described by RTL on architected registers and memory • Given technology constraints assemble adequate datapath • Architected storage mapped to actual storage • Function units to do all the required operations • Possible additional storage (eg. MAR, MBR, …) • Interconnect to move information among regs and FUs • Map each instruction to sequence of RTLs • Collate sequences into symbolic controller STD • Lower symbolic STD to control points • Implement controller EECS150 Lec26 - Summary
Instruction Sequencing • Example – an instruction to add the contents of two registers (Rx and Ry) and place result in a third register (Rz) • Step 1: Fetch the ADD instruction from memory into an instruction register • Step 2: Decode instruction • Instruction in IR has the code of an ADD instruction • Register indices used to generate output enables for registers Rx and Ry • Register index used to generate load signal for register Rz • Step 3: Execute instruction • Enable Rx and Ry output and direct to ALU • Setup ALU to perform ADD operation • Direct result to Rz so that it can be loaded into register EECS150 Lec26 - Summary
Instruction Execution • Control State Diagram (for each diagram) • Reset • Fetch instruction • Decode • Execute • Instructions partitioned into three classes • Branch • Load/store • Register-to-register • Different sequencethrough diagram for each instruction type • Controller manipulates the data path to perform the instruction Reset Init InitializeMachine FetchInstr. Load/Store XEQInstr. Branch Register-to-Register Branch Taken BranchNot Taken Incr.PC EECS150 Lec26 - Summary
Application: logical communication rcv @rdata [src] data actual actual System: System: data header trailer actual actual Hardware Hardware: header trailer Networking Layers Application: send @sdata dest actual actual Analog Transmitter Analog Receiver time EECS150 Lec26 - Summary