270 likes | 562 Views
Advances in Bus Architectures: Contenders For the Post-PCI Era. Solomon Bien and Denis Perelyubskiy December 6, 2001 EE202A. Bus Design Issues. Packet-switched vs. multi-drop bus Clocking issues Synchronous vs. asynchronous Clock control line vs. embedded clock Implementation of signals
E N D
Advances in Bus Architectures: Contenders For the Post-PCI Era Solomon Bien and Denis Perelyubskiy December 6, 2001 EE202A
Bus Design Issues • Packet-switched vs. multi-drop bus • Clocking issues • Synchronous vs. asynchronous • Clock control line vs. embedded clock • Implementation of signals • Through sideband wires vs. within messages • Bus arbitration • Serial vs. parallel
Serial vs. Parallel • Limitations of parallel buses • Crosstalk • Common ground interference problems • Transient switching noise effects • Routing delays for parallel lines: signals must arrive at the same time. The faster you switch, the more critical this is. • Serial LVDS-based buses help alleviate above problems • Result in generally higher speeds
PCI: why? – historical perspective • (E)ISA bus – (Extended) Industry Standard Architecture • 16-bit data lines: problem for 32-bit transfers (although EISA is a 32-bit bus) • No arbitration • Multiple devices don’t play nice • May actually prevent “memory refresh” from occurring, if no specific provisions • Became a bottleneck: SLOW…. (@ 4.7 or 8.33 Mhz) • EISA: fast, but few cards available. EXPENSIVE • MCA (Microchannel Architecture) Introduced by IBM (late 80s) • Higher speeds • Bus arbitration • Automatic configuration • PROPRIETARY – never caught on • In 1992 VLB (Vesa Local Bus) appeared as an alternative (VESA: Video Electronics Standards Association) • Relied on 486: extension of processor/memory bus (runs at same speed as well) • Bus was directly driven by the CPU • Intel discouraged such practice
PCI: that’s why… • Open standard - unlike MCA • Compatible with ISA - large installed base (through a bridge) • Has arbitration circuit - unlike ISA • “Low” pin count • Processor independent - unlike VESA • 33/66Mhz –“fast”(er) • 32 bit data path
PCI: detailed highlights • Synchronous, parallel, cross between system and I/O busses • Synchronous: all operations are relative to some clock • Central arbiter • Through setting REQ# high • Does not consume cycles • On a per-access, rather then time-slot based • Arbitration arbitrary (not specified, other then latency of decision making) • Multiplexed data/address pins • More complex, but saves area • 33Mhz/132 Mbps/32 bit or 66Mhz/264 Mbps/64 bit • Latter (Rev. 2.1/2.2) is more recent, but less used due to higher costs • Terminology: Bus master (initiator), Bus slave (target).
PCI: detailed highlights (2) • Processor independent • Bridge separates/buffers data between CPU / bus • Separate clock • Expansion slot limits 3-4, but extend through pci-pci bridge • Transfers: in bursts (for both memory and I/O address spaces) • Burst is address phase + 1 or more data phase • Both initiator, target may terminate the transfer by setting FRAME# • Optional 64-bit extension: • “true” 64 bits (meaning 32 more data pins) + some extra pins. • Only memory commands make sense when doing 64-bit transfers. • Command interface or any other aspect of operation does not change
PCI-X: Present-day PCI • PCI-SIG is talking about PCI 2.3, 3.0, etc, but looks like PCI-X is next • Backwards compatible with PCI: operates at the frequency of legacy device if such present • 64-bit/133.3Mhz/1066Mbps • Not enough time to decode a signal anymore within one cycle • “Normal” PCI protocol, the signal is set on the rising edge, target receives the signal, decodes it, and sets bits in response in the next clock cycle • After signal propagates to the target, it is latched, held until the following cycle, during which the signal is decoded. Only after that the target will assert some line in response. • Adds couple of cycles to request phase of the PCI burst, but transfers are much faster • This is called a register-to-register protocol
PCI-X: Present-day PCI (2) • New transaction phase added : attribute phase • Comes right after address phase • 36 bit field which describes the bus transactions in more detail than conventional PCI • Provides: • Relaxed ordering – requests from multiple devices not handled FCFS anymore • Non-cache-coherent transactions – a bit that tells system’s cache controllers: “don’t snoop” • Transaction byte count – more efficient buffer management on the bridge and bus utilization, since if bridge does not know how much data is going to be requested, it fetches some default (1-2) lines per data request. • Sequence Number – just a transaction sequence number supposedly increasing bus utilization • Split transaction support • Initiator makes a request, goes on about its business, until notified by the target that data available • Optimized “wait states”– devices which are not ready simply remove themselves from the bus
Need a successor to PCI • Bus is becoming a bottleneck • Many high speed peripherals • PCI is a parallel bus, and as such has inherent limitations, which preclude it from becoming the next-generation bus • Can’t increase width • Skew • More pins—higher cost, less general interfaces • Point to point communication reduces load on a bus
“The Replacements” • 3GIO (aka Arapahoe) • Intel, Compaq, HP, Microsoft, endorsed by PCI-SIG • Has not arrived yet, but is supposed to be a good one… • HyperTransport • AMD (inventor), Sun Microsystems, API NetWorks, Cisco, PMC Sierra, Inc., NVidia Corporation, Transmeta Corporation, and Apple Computers • Technology, formerly known as LDT (Lightning Data Interconnect) • RapidIO • Motorola (originally introduced), Cisco, Lucent, Nortel Networks, and Xilinx • Infiniband (“Infinite bandwidth”) • Intel, Dell, Hitachi, Sun Microsystems, HP, IBM, 3COM • To be used for server-to-server and server-to-storage interconnects • We do not cover, because this is not targeted as a PCI replacement (even though sometimes it is made sound like one)
3GIO: The Basics • Used to connect high speed peripherals • Aimed at all types of platforms • High bandwidth/speed serial I/O bus • Highly scalable • Point-to-point, packet-switched connections • Embedded clock • Follows PCI’s software model • Very few signals used • QoS features • Hot swappable • Power management
CPU Memory Bridge 3GIO Graphics Memory 3GIO Serial ATA HDD I/O Bridge USB2.0 3GIO Local I/O Mobile Docking 3GIO Switch Gb Ethernet* PCI Add ins 3GIO 3GIO: What’s cool • Multiple point-to-point connections • Facilitated by switches • Packets don’t need to go to host bridge • Good scalability • Allows for an interesting partition of system Figure from http://developer.intel.com/technology/3gio/
Keeps PCI’s software model PCI PnP Model (init, enum, config) PCI Software/Driver Model 2.5+ Gb/s Point to point, serial, differential,hot-plug, inter-op form factors 3GIO: How it works • Config/OS Layer • Software Layer • Transaction Layer • Matches responses with requests • Optional QoS flags • Message Signaled Interrupt (MSI) Packet-based Protocol Data Integrity • Data Link Layer • CRC, sequence numbers • Flow control protocol Figure from http://developer.intel.com/technology/3gio/ • Physical Layer • bandwidth of link linearly scaled into multiple lanes by 8b/10b encoding • Arranged by two communicating entities
HyperTransport: what a marketing person would say • Significantly faster then PCI for the same number of pins • Low latency • Low pin count • Compatible with legacy PC buses • PCI software driver compatibility major goal: HyperTransport-based I/O systems able to use PCI driver software • Not clear whether there are provisions for using PCI cards with this bus. • From the FAQ: “Why do we believe HyperTransport™ is a superior "in the box" interconnect?“ • … because it has all the functions for inbox connectivity at amazing speed, low pin count and relatively low complexity. By optimizing to the needs specific to in-the-box interconnect, HyperTransport is a fast, simple and less expensive approach. But the good news, is that we are not marketing people…
HyperTransport: details • Packet-switched, Point-to-point links • Each link may be 2,4,8,16, or 32 bits wide - Negotiated at initialization • Links interconnecting components may be asymmetric in terms of width. • Multiplexed • Commands, addresses, data all use the same links. • Clock Rates: 200MHz – 800 MHz (in 2002) • Speeds – depend on the widths of the links, but some numbers: • Dual links: 2-bit wide – 400Mbps each direction, 32-bit wide – 6.4 GBps each direct. • Asynchronous clock forwarding • “Clock forwarding”: TODO • Despite extensive searching, don’t know how “asynchronous” ties in. • Implementation: one clock line for every 8 data lines – clock skew reduced • No side-band wires • Interrupts supported as packet messages • Although, legacy hardware may in fact be supported through sideband wires.
HyperTransport: details (2) • More on physical layer • LVDS – 2 pins per bit (Receiver recognizes to 200 mV) • Info transferred on rising AND falling edge • See diagram for pin counts. • Trace lengths up to 24 inches, so may span board inter-connects. • Plug and play configuration possible, if BIOS support is there
HyperTransport: details (3) • Packets: • Multiples of 4 bytes • If links narrower then 32 bits, use adjacent bit-times • Data packets are anywhere between 4 and 32 bytes long • Nop packets: flow control (buffer depth) info • See example of command packet • Data packet, however, contains data only • Selection of command vs. data based on a control wire • Packet communications • Concept of streams: may have multiple streams (tagged) between devices • Devices may be daisy-chained, so streams are forwarded • Ordering within streams, but not between streams • Each packet has device ID – up to 32 IDs per bus chain, NOT counting the daisy-chained dev’s • HT switches • Enable devices to communicate • Communicate with PCI buses through PCI bridge • Bit-time: Half of a clock period in duration. Two data bits are transmitted on each signal per cycle.
RapidIO • Architecture is a protocol, independent of physical implementation • Packet-switched • Supports message passing and globally shared distributed memory models • Source synchronous clock signal (clocks data on rising and falling edges) • Supports multiple peer-to-peer transactions • Open standard • Control symbols sent in packets
RapidIO (cont.) • System-level bus • Intended for use in embedded communications devices Figure from http://www.rapidio.org/
RapidIO: The Layers • Logical Spec • Message passing • Globally shared distributed memory • Transport Spec • Source routing • Physical Spec • Can be implemented in any way (currently parallel LVDS) • Flow control mechanisms Figure from http://www.rapidio.org/ • Layering allows for flexibility at all levels
Our predictions • They’re all so similar that it’s difficult to say…
References • 3GIO • http://www.pcisig.com/news_room/3gio • http://developer.intel.com/technology/3gio/ • General Bus Information • http://www.pcguide.com/ref/mbsys/buses/func.htm • http://www.speedingedge.com/docs/busses.pdf • ISA • http://sunsite.tut.fi/hwb/co_ISA_Tech.html • PCI • http://www.techfest.com/hardware/bus/pci.htm • http://www.ee.vt.edu/~mishra/Lecture5.pdf • Set of not-very-detailed slides about system Buses • PCI Local Bus Specification (Revision 2.1)
References (cont.) • Hypertransport • http://www.hypertransport.org/downloads/HT_IOLink_Spec.pdf • Specification version 1.3 • http://www.hypertransport.org/documentation • FAQ • http://www.hypertransport.org/downloads/whitepapers/HT_busarch.pdf • Technology bus architecture white paper • http://www.hypertransport.org/downloads/whitepapers/25012A.pdf • I/O link whitepaper • http://www.apinetworks.com/silicon/hypertransport.pdf • Overview • http://www.planetee.com/planetee/servlet/DisplayDocument?ArticleID=5994#img3 • More protocol overview + operation • PCI-X • ftp://ftp.compaq.com/pub/supportinformation/papers/tc990903tb.pdf
References (cont.) • Serial Buses • http://www.planetee.com/planetee/servlet/DisplayDocument?ArticleID=2032 • http://www.eetimes.com/story/OEG20011115S0062 • http://www.byteandswitch.com/document.asp?site=byteandswitch&doc_id=7851 • Interference • http://www.ate.agilent.com/emt/LIBRARY/IN-CIRCUIT/DOCS/Ground_Bounce.pdf • Ground bounce • http://www.innoveda.com/products/datasheets/CrosstalkPCB.pdf • crosstalk