EECS 150 - Components and Design Techniques for Digital Systems Lec 26 - WrapUp

EECS 150 - Components and Design Techniques for Digital Systems Lec 26 - WrapUp David Culler Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~culler http://inst.eecs.berkeley.edu/~cs150 http://www.youtube.com/watch?v=Tb2Q1GGEYA4

Announcements • Final Exam • TUESDAY, DECEMBER 18, 2007 5-8P • Location: 106 STANLEY • Course Control Number: 26455 • Final Exam Group: 15 • TA office hours tues AM • Review Sunday 12/16 5-7 @ 125 Cory • Project Partner forms into HW box • Project Presentations Friday as per SignUp • No lecture thurs, no labs, no discussion • Office Hours • HW 10 in box wed EECS 150, Fa07, Lec 26 - wrap

Recall: Day 1 EECS 150, Fa07, Lec 26 - wrap

Display Camera (optional) i50choose i50talk notifications Wireless Network Internals Audio Hand input (limited) Congratulations • You have accomplished a phenomenal task. EECS 150, Fa07, Lec 26 - wrap

Deep Digital Design Experience Fundamentals of Boolean Logic Synchronous Circuits Finite State Machines Timing & Clocking Device Technology & Implications Controller Design Arithmetic Units Bus Design Encoding, Framing Testing, Debugging Hardware Architecture HDL, Design Flow (CAD) Pgm Language Asm / Machine Lang CS 61C Instruction Set Arch Machine Organization HDL FlipFlops Gates Circuits Devices EE 40 Transistor Physics Transfer Function Day 1: What is EECS150 about? EECS 150, Fa07, Lec 26 - wrap

Day 1: We Will Learn in EECS 150 … • Language of logic design • Logic optimization, state, timing, CAD tools • Concept of state in digital systems • Analogous to variables and program counters in software systems • Hardware system building • Datapath + control = digital systems • Hardware system design methodology • Hardware description languages: Verilog • Tools to simulate design behavior: output = function (inputs) • Logic compilers synthesize hardware blocks of our designs • Mapping onto programmable hardware (code generation) • Contrast with software design • Both map specifications to physical devices • Both must be flawless … EECS 150, Fa07, Lec 26 - wrap

Day 26: Ready to tackle ANY digital design EECS 150, Fa07, Lec 26 - wrap

Tackling complex digital designs • Step 1: Decompose the system into a collection of subsystems • Each has top-down requirements and bottom-up constraints • Interconnected through interfaces • Often with particular protocols • Potentially different clock domains • Rate matching, buffering, timing • For example… EECS 150, Fa07, Lec 26 - wrap

ADV7194 Clock Domain CC2420 Clock domain Network AC97 Clock domain For Example • Encodings • Protocols • Synchronization • Commands • Formats • Specifications • Datasheets Display Camera (optional) Video encoder Audio Hand input (limited) EECS 150, Fa07, Lec 26 - wrap

EECS150 wks 6-15 EECS150 wks 1-6 Traversing Digital Design CS61C EE 40 EECS 150, Fa07, Lec 26 - wrap

signals In Each: Datapath and Control Datapath Controller • Datapath: Storage, FU, interconnect sufficient to perform the desired functions • Inputs are Control Points • Outputs are signals • Controller: State machine to orchestrate operation on the data path • Based on desired function and signals Control Points EECS 150, Fa07, Lec 26 - wrap

Tackling complex digital designs • Step 1: Decompose the system into a collection of subsystems • Each has top-down requirements and bottom-up constraints • Interconnected through interfaces • Often with particular protocols • Potentially different clock domains • Rate matching, buffering, timing • For Each Subsystem • Step2: Design the Datapath EECS 150, Fa07, Lec 26 - wrap

What makes Digital Systems tick? Combinational Logic clk time EECS 150, Fa07, Lec 26 - wrap

A standard high-level representation for describing systems. It follows from the fact that all synchronous digital system can be described as a set of state elements connected by combination logic (CL) blocks: RTL comprises a set of register transfers with optional operators as part of the transfer. Example: regA  regB regC  regA + regB if (start==1) regA  regC Personal style: use “;” to separate transfers that occur on separate cycles. Use “,” to separate transfers that occur on the same cycle. Example (2 cycles): regA  regB, regB  0; regC  regA; Register Transfer Level Descriptions EECS 150, Fa07, Lec 26 - wrap

A Register Transfer C  A Sel  0; Ld  1 C  B Sel  1; Ld  1 A B Sel0 D E C Sel 0 1 Sel1 Bus Clk Sel Ld Ld C Clk A on Bus B on Bus One of potentially many source regs goes on the bus to one or more destination regs Register transfer on the clock Ld C from Bus ? EECS 150, Fa07, Lec 26 - wrap

MUX MUX MUX MUX rt rs rd R4 rd rs R4 rt R4 rs rt rd MUX BUS Register Transfers - interconnect • Point-to-point connection • Dedicated wires • Muxes on inputs ofeach register • Common input from multiplexer • Load enablesfor each register • Control signalsfor multiplexer • Common bus with output enables • Output enables and loadenables for each register EECS 150, Fa07, Lec 26 - wrap

CO CO ALU CI ALU ALU CI AC AC AC rt R0 rs rt rd rd rs rt rd R0 rs R0 frommemory frommemory frommemory Data Path (Bit-slice) • Bit-slice concept: iterate to build n-bit wide datapaths • Data bit busses run through the slice 1 bit wide 2 bits wide EECS 150, Fa07, Lec 26 - wrap

Approaching an ISA Instruction Set Architecture Defines set of operations, instruction format, hardware supported data types, named storage, addressing modes, sequencing Meaning of each instruction is described by RTL on architected registers and memory Given technology constraints assemble adequate datapath Architected storage mapped to actual storage Function units to do all the required operations Possible additional storage (eg. MAR, MBR, …) Interconnect to move information among regs and FUs Map each instruction to sequence of RTLs Collate sequences into symbolic controller STD Lower symbolic STD to control points Implement controller 18 EECS 150, Fa07, Lec 26 - wrap

Instruction Types Data Manipulation Add, subtract Increment, decrement Multiply Shift, rotate Immediate operands Data Staging Load/store data to/from memory Register-to-register move Control Conditional/unconditional branches in program flow Subroutine call and return 19 EECS 150, Fa07, Lec 26 - wrap

Hardware Necessary To Implement Instructions Standard FSM Elements State register Next-state logic Output logic (datapath/control signaling) Moore or synchronous Mealy machine to avoid loops unbroken by FF Plus Additional ”Control" Registers (in DP) Instruction register (IR) Program counter (PC) Inputs/Outputs Outputs control elements of data path Inputs from data path used to alter flow of program (test if zero) 20 EECS 150, Fa07, Lec 26 - wrap

FSM Controller for CPU Putting it all togetherand closing the loop the famousinstructionfetchdecodeexecutecycle reset instructionfetch Fetch instructiondecode Decode add instructionexecution add 21 EECS 150, Fa07, Lec 26 - wrap

Representing Numbers • What can be represented in N bits? • 2N distinct symbols => values • Unsigned 0 to 2N - 1 • 2s Complement -2(N-1) to 2(N-1) - 1 • ASCII -10(N/8-2) - 1 to 10(N/8-1) - 1 • But, what about? • Very large numbers? (seconds/century) 3,155,760,000ten (3.15576ten x 109) • Very small numbers? (secs/ nanosecond) 0.000000001ten (1.0ten x 10-9) • Bohr radius 0.000000000052917710m (5.2917710 x 10-11) • Rationals 2/3 (0.666666666. . .) • Irrationals 21/2 (1.414213562373. . .) • Transcendentals e (2.718...), π (3.141...) EECS 150, Fa07, Lec 26 - wrap

2s Complement Overflow How can you tell an overflow occurred? Add two positive numbers to get a negative number or two negative numbers to get a positive number -1 -1 +0 +0 -2 -2 1111 0000 +1 1111 0000 +1 1110 1110 0001 0001 -3 -3 +2 +2 1101 1101 0010 0010 -4 -4 1100 +3 1100 +3 0011 0011 -5 -5 1011 1011 0100 +4 0100 +4 1010 1010 -6 -6 0101 0101 +5 +5 1001 1001 0110 0110 -7 -7 +6 +6 1000 0111 1000 0111 -8 -8 +7 +7 -7 - 2 = +7! 5 + 3 = -8! EECS 150, Fa07, Lec 26 - wrap

Computer Arithmetic • Circuit design for unsigned addition • Full adder per bit slice • Delay limited by Carry Propagation • Ripple is algorithmically slow, but wires are short • Carry select • Simple, resource-intensive • Excellent layout • Carry look-ahead • Excellent asymptotic behavior • Great at the board level, but wire length effects are significant on chip • Digital number systems • How to represent negative numbers • Simple operations • Clean algorithmic properties • 2s complement is most widely used • Circuit for unsigned arithmetic • Subtract by complement and carry in • Overflow when cin xor cout of sign-bit is 1 EECS 150, Fa07, Lec 26 - wrap

2s Complement Adder/Subtractor A - B = A + (-B) = A + B + 1 EECS 150, Fa07, Lec 26 - wrap

Combinational Multiplier: accumulation of partial products A1 B1 A1 B0 A0 B1 A0 B0 A0 B0 A3 B3 A2 B0 A2 B1 A1 B2 A0 B3 A2 B2 A2 B0 A1 B1 A0 B2 A3 B1 A2 B2 A1 B3 A3 B3 A3 B2 A2 B3 S7 S6 S4 S5 S3 S2 S1 S0 EECS 150, Fa07, Lec 26 - wrap

Add CPA Another Representation Building block: full adder + and 4 x 4 array of building blocks EECS 150, Fa07, Lec 26 - wrap

Digital Number Systems • Positional notation • Dn-1 Dn-2 …D0 represents Dn-1Bn-1 + Dn-2Bn-2 + …+ D0 B0 where Di { 0, …, B-1 } • 2s Complement • Dn-1 Dn-2 …D0 represents: - Dn-12n-1 + Dn-22n-2 + …+ D0 20 • MSB has negative weight • Binary Point is effectively at the far right of the word -1 +0 -2 1111 0000 +1 1110 0001 -3 +2 1101 0010 -4 1100 +3 0011 -5 1011 0100 +4 0000… 1010 -6 0101 +5 1001 0110 -7 +6 1000 0111 EECS 150, Fa07, Lec 26 - wrap -8 +7

Circuits for Fixed-Point Arithmetic • Adders • identical circuit • Position of the binary point is entirely in the interpretation • Be sure the interpretations match • i.e. binary points line up • Subtractors • Multipliers • Position of the binary point just as you learned by hand • Mult two n-bit numbers yields 2n-bit result with binary point determined by binary point of the inputs • 2-k * 2-m = 2-k-m + * EECS 150, Fa07, Lec 26 - wrap

Sa Sb Ea Eb 1.Ma 1.Mb 8 8 8 24 24 24 Sr Er 1.Mr Let’s build an FP function unit: mult Ctrl? * EECS 150, Fa07, Lec 26 - wrap

Sa Sb Eb Ea 1.Ma 1.Mb 8 8 8 24 48 24 48 shifter inc ? 24 Sr Er 1.Mr What is the range of mantissas? Adder(8) Ctrl? Multiplier(24) -127 Unnorm? Round EECS 150, Fa07, Lec 26 - wrap

Cascaded Carry Lookahead 4 bit adders with internal carry lookahead second level carry lookahead unit, extends lookahead to 16 bits One more level to 64 bits EECS 150, Fa07, Lec 26 - wrap

x BA 30 74 30 74 30 30 BA Ax 10 BAx 54 54 32 10 76 54 32 76 10 30 10 54 6 0 2 4 0 4 2 1 6 5 3 7 76 54 32 10 74 64 30 20 70 60 50 40 Parallel Prefix (generalizing CLA) • Compute all the prefixes Fi = Fi-1 op Fi-2 op … op F0 • Assume associative and commutative 70 B A EECS 150, Fa07, Lec 26 - wrap

Address Decoder Word Line Memory cell 2n word lines what happensif n and/or m isvery large? n Address Bits m Bit Lines Basic Memory Subsystem Block Diagram • RAM/ROM naming convention: • 32 X 8, "32 by 8" => 32 8-bit words • 1M X 1, "1 meg by 1" => 1M 1-bit words EECS 150, Fa07, Lec 26 - wrap

A N 2 words N x M bit SRAM WE_L OE_L D M Write Hold Time Read Access Time Read Access Time Write Setup Time Typical SRAM Timing OE determines direction Hi = Write, Lo = ReadWrites are dangerous! Be careful! Double signaling: OE Hi, WE Lo Write Timing: Read Timing: High Z D Data In Data Out Data Out Junk A Write Address Read Address Read Address OE_L WE_L EECS 150, Fa07, Lec 26 - wrap

RAS_L DRAM WRITE Timing RAS_L CAS_L WE_L OE_L • Every DRAM access begins at: • The assertion of the RAS_L • 2 ways to write: early or late v. CAS A 256K x 8 DRAM D 9 8 DRAM WR Cycle Time CAS_L A Row Address Col Address Junk Row Address Col Address Junk OE_L WE_L D Junk Data In Junk Data In Junk WR Access Time WR Access Time Early Wr Cycle: WE_L asserted before CAS_L Late Wr Cycle: WE_L asserted after CAS_L EECS 150, Fa07, Lec 26 - wrap

Memory Array DRAM with Column buffer R O W D E C O D E R … 11 A0…A10 (2,048 x 2,048) Storage W ord Line Cell Sense Amps Column Latches MUX Pull column into fast buffer storage Access sequence of bits from there EECS 150, Fa07, Lec 26 - wrap

Use more parity bits to pinpoint bit(s) in error, so they can be corrected. Example: Single error correction (SEC) on 4-bit data use 3 parity bits, with 4-data bits results in 7-bit code word 3 parity bits sufficient to identify any one of 7 code word bits overlap the assignment of parity bits so that a single error in the 7-bit work can be corrected Procedure: group parity bits so they correspond to subsets of the 7 bits: p1 protects bits 1,3,5,7 (bit 1 is on) p2 protects bits 2,3,6,7 (bit 2 is on) p3 protects bits 4,5,6,7 (bit 3 is on) 1 2 3 4 5 6 7 p1 p2 d1 p3 d2 d3 d4 Bit position number 001 = 110 011 = 310 101 = 510 111 = 710 010 = 210 011 = 310 110 = 610 111 = 710 100 = 410 101 = 510 110 = 610 111 = 710 p1 p2 p3 Hamming Error Correcting Code Note: number bits from left to right. EECS 150, Fa07, Lec 26 - wrap

d1 d2 d3 d4 d5 d6 d7 d8 p4 p2 p1 p3 Example: 8 bit SEC • Takes four parity bits • In power of 2 positions • Rest are the data bits • Bits with i in their address feed into parity calculation for pi • What to do with bit 0? 1 2 3 4 5 6 7 8 9 10 11 12 + EECS 150, Fa07, Lec 26 - wrap

Example: Ethernet CRC-32 Application (HTTP,FTP, DNS) 7 Transport (TCP, UDP) 4 Network (IP) 3 Data Link (Ethernet, 802.11b) 2 Physical 1 EECS 150, Fa07, Lec 26 - wrap

General Model of Synchronous Circuit • In general, for correct operation: for all paths. • How do we enumerate all paths? • Any circuit input or register output to any register input or circuit output. • “setup time” for circuit outputs depends on what it connects to • “clk-Q time” for circuit inputs depends on from where it comes. T  time(clkQ) + time(CL) + time(setup) T clkQ + CL + setup EECS 150, Fa07, Lec 26 - wrap

Inverter: NAND gate: s g d s Gate Switching Behavior When does it start? How quickly does it switch? EECS 150, Fa07, Lec 26 - wrap

Xilinx Virtex-E Floorplan • Configurable Logic Blocks • 4-input function gens • buffers • flipflop • Input/Output Blocks • combinational, latch, and flipflop output • sampled inputs • Block RAM • 4096 bits each • every 12 CLB columns EECS 150, Fa07, Lec 26 - wrap

Logic Gate Delay What are typical delay values? Delays in flip-flops Both times contribute to limiting the clock period. Limitations on Clock Rate • What must happen in one clock cycle for correct operation? • Assuming perfect clock distribution (all flip-flops see the clock at the same time): • All signals must be ready and “setup” before rising edge of clock. EECS 150, Fa07, Lec 26 - wrap

Timing Methodologies • Rules for interconnecting components and clocks • Guarantee proper operation of system when strictly followed • Approach depends on building blocks used for memory elements • Focus on systems with edge-triggered flip-flops • Found in programmable logic devices • Many custom integrated circuits focus on level-sensitive latches • Basic rules for correct timing: • (1) Correct inputs, with respect to time, are provided to the flip-flops • (2) No flip-flop changes state more than once per clocking event EECS 150, Fa07, Lec 26 - wrap

Construct D flipflop from two D latches Master-Slave Structure clk’ clk clk clk’ clk clk’ clk’ clk EECS 150, Fa07, Lec 26 - wrap

slave stage master stage R Q' R Q' P' R S Q S Q S P CLK Master-Slave Structure • Break flow by alternating clocks (like an air-lock) • Use positive clock to latch inputs into one R-S latch • Use negative clock to change outputs with another R-S latch • View pair as one basic unit • master-slave flip-flop • twice as much logic • output changes a few gate delays after the falling edge of clock but does not affect any cascaded flip-flops CLK CLK’ EECS 150, Fa07, Lec 26 - wrap

D’ D 0 R Q Clk=1 Q’ S 0 D’ D (neg) Edge-Triggered Flip-Flops • More efficient solution: only 6 gates • sensitive to inputs only near edge of clock signal (not while high) holds D' when clock goes low negative edge-triggered D flip-flop (D-FF) 4-5 gate delays must respect setup and hold time constraints to successfullycapture input holds D whenclock goes low characteristic equationQ(t+1) = D EECS 150, Fa07, Lec 26 - wrap

D D Q Q a b Two-phase non-overlapping clocks • Sequential elements partition into two classes • phase0 ele’ts feed phase1 • phase1 ele’ts feed phase0 • Approximate single phase: each register replaced by a pair of latches on two phases • Can push logic across (retiming) • Can always slow down the clocks to meet all timing constraints a b c/l clk1 clk-0 in clk0 clk1 EECS 150, Fa07, Lec 26 - wrap

Tackling complex digital designs • Step 1: Decompose the system into a collection of subsystems • Each has top-down requirements and bottom-up constraints • Interconnected through interfaces • Often with particular protocols • Potentially different clock domains • Rate matching, buffering, timing • For Each Subsystem • Step 2: Design the Datapath • Step 3: Design the Controller EECS 150, Fa07, Lec 26 - wrap

EECS 150 - Components and Design Techniques for Digital Systems Lec 26 - WrapUp