440 likes | 453 Views
Digital Interface Design. EECS150 Fall 2008 – Lecture #23 Greg Gibeling Slides adapted from everywhere. Motivation. Any useful system includes at least two interfaces: input and output In a computer: keyboard & screen In your project: audio & video
E N D
Digital Interface Design EECS150 Fall 2008 – Lecture #23 Greg Gibeling Slides adapted from everywhere EECS150 Lecture #23
Motivation • Any useful system includes at least two interfaces: input and output • In a computer: keyboard & screen • In your project: audio & video • The most difficult work in any system is matching incompatible interfaces • Compare CS70 and CS61B • Compare K-maps or adder design and your project • You will be designing interfaces • Either hardware or software • The basic ideas presented here apply fairly widely EECS150 Lecture #23
Outline • Quick Review: SDRAM and Audio • Principles • Metrics: Bandwidth, Latency, Pin Count & Logic Overhead • Datapath & Control (States & Events) • Synchronization: Clock & Reset • Handshaking (Ready/Valid) • Protocols (structure, syntax, sematics) • Interfaces • Simple Interfaces: SPI, I2C, UART, N64 • Intermediate Interfaces: LCD, Ethernet (10M-10G), Interchip • CPU Interfaces: ISA, PCIe • Design • Back to principles • Reuse & Standardization • Modeling, Verification & Debugging EECS150 Lecture #23
Quick Review (1 of 4) • So What? • Almost everything needs storage • Lots of space -> DRAM • SDRAM • SDRAM is BIG • Time multiplex address lines • 2 Dimensional Address (Row & Column) • Often Shared • Arbitration for access • Affects performance EECS150 Lecture #23
Quick Review (2 of 4) • SDRAM (cont) • Steps to Read/Write • Send Row Address (RAS) • Send Column Address (CAS) • Send/Get Data (For 2,4,8 cycles) • Wait (precharge, autorefresh, etc) • Synchronous Interface • Uses a clock & bursts to increase bandwidth • Control requires precise timing • Issue sequences of commands • Timing must be matched to clock frequency EECS150 Lecture #23
Quick Review (3 of 4) • So what? • Example data stream • Low bandwidth • Includes control • Audio • Primary interfaces are analog • Audio is analog • Mixers, etc… • Bit Serial • Low & fixed bandwidth • Low complexity • Expandable (e.g. 5.1, 7.1) EECS150 Lecture #23
Quick Review (4 of 4) • Audio (cont) • Driver • Pair of shifters • Simple sync framing • Control • Abstract registers • Highly stateful • VERY low bandwidth EECS150 Lecture #23
Metrics (1 of 3) • So What? • We need some way to judge good vs bad • Allows us to compare interfaces without guessing • Evaluate tradeoffs and requirements in a formal manner • Objective Metrics • Bandwidth • Latency • Pin Count • State & Logic Overhead • Subjective Metrics • Documentation • Ease of use or debugging • Elegance EECS150 Lecture #23
Metrics (2 of 3) • Bandwidth • High or Low • Higher is always better, but e.g. humans can only hear so much • Video, Audo are classic, but programs need instructions, which means DRAM bandwidth • Fixed or Variable • Raw video or audio have fixed bandwidth, compression (e.g. MP3) can make this vary • Network bandwidth varies because of sharing • Latency • High or Low • Lower is usually better • If there’s no elastic buffer (no way to say “I’m not ready”) • This can cause data loss or require extra buffering, which is costly • Humans are very sensitive to gross latency • Generally reducing latency is VERY HARD without affecting the clock rate • Fixed or variable • Generally referred to as “jitter” • E.g. on VOIP phones, Audio is fixed latency, network is variable, so we have a problem EECS150 Lecture #23
Metrics (3 of 3) • Pincount • Fast becoming a major problem • Chip area grows with N2, Pins are N for DIPs or N2 for BGAs • Either way pins are just physically large • They require a lot of area • They are slow and power hungry • Serial vs Parallel • Old: Parallel for high bandwidth • New: Serial for high bandwidth • What changed? • State & Logic Overhead • This is where major cost & complexity come into play • The bigger the circuit the more places to have a bug • Also affects power, yield and price • Interfaces can be very large • For example DDR2 SDRAM on a Virtex2 Pro • The FPGA couldn’t support the clocking/handshaking easily • Required an incredible amount of logic to make up for this • Never very reliable as a result EECS150 Lecture #23
Datapath & Control (1 of 4) • So What? • Separates the data & control • Allows us to understand the meaning of signals • Separates timing from dataflow • Datapath • Variable information not known until runtime • Regular structure or meaning (e.g. all integers) • Easy to design and debug • Control • Circuits which deal with meaning and timing • Small, irregular and complicated • Difficult to design and debug, even harder to extend EECS150 Lecture #23
Datapath & Control (2 of 4) • Datapath Signals • Wires which carry a value with temporal significance • Form the backbone of the datapath • May include “control” values • E.g. that this is a value to be written to DRAM • This is common in “data stationary control” • Coding • Common Codes • Binary: easy to understand, easy to work with • One-hot: allows inexpensive decoding • Gray Code: asynchronous logic, one bit change at a time • Other issues: state coding, floating point, etc EECS150 Lecture #23
Datapath & Control (3 of 4) • Control Signals • Wires which carry timing, but little data • Form the backbone of the control logic • Enables, resets, and so forth fall into this category • Event Coding • Edge (neg or pos) • Generally we only use the clock edge in FPGA designs • Latch based designs use edges all the time, of course • Pulse High • Do something when a wire is 1, usually relative to a clock edge • Pulse Change • Do something when a signal is different than on the last cycle • Time • Do something a certain amount of time after a previous event • Measured with a clock in synchronous systems • Possible to build “delay lines” using transistors and gates EECS150 Lecture #23
Synchronization (1 of 4) • Clocking • 1 Clock • Fully synchronous, no need to worry about the issue • May have multiple resets • E.g. hold video in reset until SDRAM is ready • Can get pretty complex (e.g. CPU & JTag) • 2 Clocks • Clock Crossing, easy to keep straight • Often use Async FIFOs and dual port RAMs on FPGAs • These are expensive in ASICs, use synchronizers • Obviously multiple resets • Local Clocks & LocalResetGen • Often restricted to use in an interface (e.g. interchip) • May not be free-running • Often require careful design to avoid issues EECS150 Lecture #23
Synchronization (2 of 4) • Reset • 1 Clock, no initialization • Multistage Initialization • Reset for one module depends on state of another • Using the ButtonParser is an example of this • 2 Clocks • Usually reset is synchronous to one clock • May need a shift register to resynchronize reset • Self starting • Useful for generating a reset for the rest of the system • Any device which “just works” on power-up has one • Can be built on FPGA by using a shift register with an initial value • Local Resets & LocalResetGen • Reset logic can affect clocking & reliability • May be requirements like holding reset for some time EECS150 Lecture #23
Synchronization (3 of 4) EECS150 Lecture #23
Synchronization (4 of 4) EECS150 Lecture #23
Handshaking (1 of 4) • So What? • When things happen is vital • Hardware modules must cooperate in order to be useful • Planning out all interaction timings on the drawing board is best, but often hopeless • Handshakes • Pipelined (None) • 2 & 4 Cycle (Self-timed) • Ready/Valid (Synchronous) EECS150 Lecture #23
Handshaking (2 of 4) • 4 Cycle • RTZ: Return to Zero • Fewer transistors • Easier to debug • 2 Cycle • More transistors • Not really faster • NRTZ: Non-RTZ • Can be synchronous • GasP • RTZ handshaking • Carefully delay matched circuits • No clock! EECS150 Lecture #23
Handshaking (3 of 4) • Ready/Valid • Independent • Avoid combinational loops • Simplifies generation and checking • Symmetric • Composable • Allows the pass-through • Coregen FIFOs asymmetric • Latency Insensitive • Allows modules to run at their own pace • Trades cost to do this!! • Send/Accept • Same signals, new names! • Why? Read on…. EECS150 Lecture #23
Handshaking (4 of 4) • Composition Failure • Arbiter chooses one of two inputs • Router chooses one of two outputs • Read0 & Valid1 • Any time two modules are connected by two paths… • Classes • Class1: No dependencies • Class2: Dependencies between ports • Class3: Dependencies within ports EECS150 Lecture #23
Protocols (1 of 5) • So What? • Know the data isn’t enough, we need meaning • Just like language we build representations of meaning • Knowing the patterns to meaning, allows us to abstract it • Structure • Parallel: all the bits at once • Counted: there are a fixed number of words, we count them off • Framed: adding a higher level handshake allows variable length • Syntax • How the data fits together • We’ll cover this more in the next few slides • Sematics • What the data means • Highly dependent on the interface in question • Terms: The Band • In Band: the data we’re trying to move • Out of Band: control, metadata and other issues EECS150 Lecture #23
Protocols (2 of 5) • Dataflow Based • Audio, video, instructions in a CPU • Generally when there’s little (no) OOB data • Usually parallel or counted for simplicity • Benefits • Excellent handling of LTI or independent data values • Simple production and consumption • Little or no state, e.g. a valid bit is all you need • Allows construction of specialized hardware (DSP designs for example) • Drawbacks • Very difficult, if not impossible to deal with exceptions • For playing audio: what if you need data but it’s not there? • When things fail there’s often nothing you can do EECS150 Lecture #23
Protocols (3 of 5) • Command Based • Useful for low bandwidth peripherals • Organized according to master/slave • E.g. draw a line, write a word to memory • Benefits • Very easy to build new slaves • Clear demarcation of responsibility (Good for CPUs) • Generally very easy to expand, just add new commands • Drawbacks • Tends to be very low performance • Overhead to specify command • No parallelism • Usually requires some polling (interrupts are poll based) • Requires master to know state at all times EECS150 Lecture #23
Protocols (4 of 5) • Register Based • Stateful peripherals with lots of config • Organized according to master/slave • Often used alongside a dataflow interface • Benefits • Provides a memory-like abstraction • Allows the master to read state easily • Easy to deal with exceptional conditions (error flag) • Drawbacks • Medium performance • Overhead to specify read/write and register address • DMA can help with this • Requires a clear master, often meaning an FSM/CPU EECS150 Lecture #23
Protocols (5 of 5) • Layering • Uncommon to have one syntax • They are easy to layer • Dataflow on top of command • Each command can be a “write <data>” • Not entirely efficient, but gets the job done • This is how software FIFOs and networks work • Register on top of command • Two commands: read & write • Relatively common, allows command wires to be shared • This is how most memories, especially DRAMs work • Command on top of register • Writing a certain value to a register indicates the command • Perhaps a series of writes to registers • Many CPU peripherals do this EECS150 Lecture #23
Simple Interfaces (1 of 4) • So What? • Uses few wires • No tristates • Synchronous • SPI • Signals: SO, SI, CS, CLK • Uses: CC2420, ADC • Bit Serial • Bidirectional • Often used with register syntax EECS150 Lecture #23
Simple Interfaces (2 of 4) • So What? • Fewest pins (almost) • Control, not data • Long distance • I2C • Uses two wires • Master/Slave • Includes handshake • Bit Serial • Bidirectional • Often used with register syntax EECS150 Lecture #23
Simple Interfaces (3 of 4) • History • In IBM PCs • RS232 and RS485 • Still widely used • Simple/cheap • Noise resistant • Problems • Low bandwidth • Limited by internal timing clocks • Very low level protocol • So What? • Very few pins (3) • No clock required • Long distance • UART • Bit serial • No clock signal • Good & Bad • Relies on timing for events • Often used with dataflow syntax EECS150 Lecture #23
Simple Interfaces (4 of 4) • So What? • N64 Controllers • Used in projects • N64 • Asynchronous • More robust than UART • Command Syntax • Main: Reset & Read Buttons • Other: Status, Mempack, EEPROM • Receiving a bit: • Look for 1’b1 (Stop) -> 1’b0 (Start) • Wait 1us (why 1us?!?) • Capture Data EECS150 Lecture #23
Intermediate Interfaces (1 of 4) • So What? • HD44780, standard • 4 or 8b operation • Interesting timing • LCD • Interface • LCD_DB[7:0]: Data • LCD_RS: Registe select • LCD_RW: Read/Write • LCD_E • Enable/Strobe • Provides timing EECS150 Lecture #23
Intermediate Interfaces (2 of 4) • So What? • Used everywhere • Framed structure • Dataflow syntax • 10M-1G Ethernet • Bit Serial Link • 4/5bit Encoding takes 20% overhead • Bit5 is used for Data-Valid and Error • Preamble used for clock extraction • Inter Frame Gap ensures packets aren’t back-to-back • CRC used to avoid errors from transmission EECS150 Lecture #23
Intermediate Interfaces (3 of 4) • 10M-1G Ethernet • Receive • Wait for DataValid & SFD • Start shifting/FIFOing data • Wait for DataValid to go low • Check CRC, discard/mark packet • Transmit is similar • CRC • An LFSR based code • Appended to the end of each frame • Used to ensure nothing is corrupted EECS150 Lecture #23
Intermediate Interfaces (4 of 4) • So What? • Source Synchronous • Very high bandwidth • 966Mbps per pair • Interchip • Dataflow structure • Send clock alongside data • Requires async FIFO • Differential pairs require special signaling for this EECS150 Lecture #23
CPU Interfaces (1 of 3) • So What? • Allow CPU to control peripherals • Old: Simplicity of I/O devices (no FPGAs back in the day) • New: Bandwidth (audio & video) • Key Assumptions • CPU is in control • Separation of data (high bandwidth) and control (very low latency) • Basic Organization • Historically “bus” based • Single arbiter, or even single master • Most devices are simple and respond only • Memory/register centric (e.g. read/write ops) • Newer point to point designs • PCIe, HyperTransport • Based on command packets (e.g. read/write ops) EECS150 Lecture #23
CPU Interfaces (2 of 3) • So What? • Very widespread standard • Simple enough to describe here • ISA • Synchronous bus • Assumes 1 cycle access • 8MHz standard • Basic Operations • Address (CPU -> IO) • Control (CPU -> IO) • Data (CPU <-> IO) • Extensions • DMA • Interrupts • History • IBM PC XT • 8b and then 16b • PnP Added Later • Open Standard EECS150 Lecture #23
CPU Interfaces (3 of 3) • So What? • Higher bandwidth than old parallel busses • Overcomes pin limitations • Separates physical and logical transport to allow more complex analog design • PCIe • Based on bit-serial lanes • Very high bandwidth • Channel bonding, similar to 10Gbps Ethernet • Point to Point • Packet/Switch Based • High overhead for small messages (interrupts) • Layers • Physical • Data Link (ack/nak) • Transactions (memory/int) • History • Developed by Intel • 2.5 GTps, 5GTps … EECS150 Lecture #23
Design (1 of 3) • So What? • Well, you’ve been designing some interfaces • You will keep using them • Similar principles apply to hardware and software • Back to Principles • What do you want from the interface (SHOULD) • What do you need from the interface (MUST) EECS150 Lecture #23
Design (2 of 3) • Reuse & Standardization • May introduce overhead • Leverage well tested modules • Eases debugging & documentation • Modeling, Verification & Debugging • Requires two implementations • E.g. transmitter & receiver • Automated testing • Allows you to quickly verify any changes • Greatly simplifies life for someone else EECS150 Lecture #23
Design (3 of 3) • Good Interfaces • Simplify the interacting modules • Both the design and implementation • Simplify doesn’t always mean “making smaller” • Are self-documenting • Are naturally widely applicable • Bad Interfaces • Are complex, or hard to debug • Are expensive to design and implement • Make incorrect assumptions • Do more work than necessary • Eliminating timing assumptions, when we know the timing • Otherwise checking invariants we know to be true EECS150 Lecture #23
A Case Study (1 of 2) • The RAMP DRAM Interface • What MUST we do • Convey address to the controller • Convey data in both directions • Support handshaking to deal with variable latency in controller • What should we do • Allow multiple users to share DRAM • Support extremely high bandwidth • The Design • 3 FIFOs with Ready/Valid • Command: read/write and address to controller • DataIn: data to be written (and mask) • DataOut: data which was read (and any error counts for ECC) EECS150 Lecture #23
A Case Study (2 of 2) • Metrics • Bandwidth: maximized by using wide data FIFOs • Latency: minimized by avoiding any serialization • Pint Count: dictated by need for maximum bandwidth • Complexity: low thanks to ready/valid • Datapath & Control • All 3 FIFOs are datapath • Separate initialization & power state for control • Clocking: Each FIFO can have a separate clock • Handshaking is Ready/Valid • Protocol • Low level: dataflow • Intermediate level: commands • High level: register EECS150 Lecture #23
Summary (1 of 2) • Any useful system includes at least two interfaces: input and output • The most difficult work in any system is matching incompatible interfaces • Principles • Metrics: Bandwidth, Latency, Pin Count & Logic Overhead • Datapath & Control (States & Events) • Synchronization: Clock & Reset • Handshaking (Ready/Valid) • Protocols (structure, syntax, sematics) • Design • Back to principles • Reuse & Standardization • Modeling, Verification & Debugging EECS150 Lecture #23
Summary (2 of 2) • Interfaces • Simple Interfaces • SPI, I2C, UART, N64 • JTag, Slave Serial, MDI (Ethernet) • Intermediate Interfaces • SDRAM, Audio, LCD, Ethernet (10M-10G), Interchip • CC2420, Video Encoder/Decoder • CPU Interfaces • ISA, PCIe • MCA, PCI, PCI-X, HyperTransport, Intel FSB, AGP, AMBA EECS150 Lecture #23