230 likes | 277 Views
Reconfigurable FPGAs (The Xilinx Virtex II Pro / ProX FPGA family). Shadab Ambat. 03/02/2006. Introduction. Virtex-II Pro FPGAs were incorporated in 2002 and fabricated in 130 nm, 1.5V process technology.
E N D
Reconfigurable FPGAs (The Xilinx Virtex II Pro / ProX FPGA family) Shadab Ambat 03/02/2006
Introduction • Virtex-II Pro FPGAs were incorporated in 2002 and fabricated in 130 nm, 1.5V process technology. • They incorporate upto 2 embedded IBM PowerPC (400 MHz) processors and 3.125 Gbps RocketIO serial transceivers. • The, Virtex-II Pro X is an extended version, introduced in 2003 with a transceiver data rate to 6.25 Gbps.
Introduction (contd.) • They can be scaled down or up in features, density, I/O and performance depending on the requirements. • Embedded and distributed memory • Digital Clock Management for on-chip/off-chip clock synthesis and synchronization • XCITE digitally-controlled I/O impedance to improve signal integrity and reduce board space • Full/partial FPGA reconfiguration • Up to 444 18X18 bit embedded multipliers • Extensive library of DSP algorithms • DSP tools such as The MathWorks MATLAB/Simulink, the Xilinx System Generator for DSP, and Cadence SPW • Over 200 IP cores from Xilinx and partners
Virtex II ProX • Two devices - 2VPX20 and 2VPX70 • Has all features of the Pro version • Additional features include new embedded RocketIO X serial transceivers consisting of:- • Eight or 20 channels per device • 2.488 to 6.25 Gbps per channel • 20X and 16X clocks and data paths • 8b/10b and 64b/66b encoding • SONET/SDH OC-48 jitter compliance • Programmable receiver equalization for enhanced signal integrity • Ideally suited for 10G Ethernet, SONET OC-48, and proprietary backplane applications
Architecture Overview • Apart from the RocketIO cores and the RISC CPU, the FPGA consists of the following: Input/Output Blocks (IOBs): • They are bidirectional blocks that can be programmed as inputs or outputs. Inputs and outputs can have optional SDR or DDR data registers and outputs can also have optional tri-state buffers. Configurable Logic Blocks (CLBs): • FPGAs consist of an array of CLBs arranged in rows and columns. These are the blocks that implement the sequential and combinatorial logic of the FPGAs
Architecture Overview (contd.) Block SelectRAM+ Memory: • They consist of 18 Kb of true Dual-Port RAM, programmable from 16K x 1 bit to 512 x 36 bit, in various depth and width configurations. • 18 X 18 Bit Multipliers: A multiplier block is associated with each SelectRAM+ memory block. • The multiplier block is a dedicated 18 x 18-bit 2s complement signed multiplier. • Read/multiply/accumulate operations and DSP filter structures are extremely efficient. Global Clocking: • Implemented usingDigital Clock Manager (DCM) and global clock multiplexer buffers. • Upto 12 DCMs available, providing deskewed and 90-, 180- and 270- degree phase shifted versions of its output clocks.
Architecture Overview (contd.) Routing Resources: • All the above blocks use the same interconnect scheme and the same access to the global routing matrix. Boundary Scan: • Boundary-scan (JTAG) instructions and associated data registers support a standard methodology for accessing and configuring Virtex-II Pro devices. • Can function in system mode, in which the device will continue to function while executing non-test boundary- • scan instructions or in test mode in which boundary-scan test instructions control the I/O pins for testing purposes. Configuration: • Virtex-II Pro / Virtex-II Pro devices are configured by loading the bitstream into internal configuration memory using one of the following modes: • Slave-serial mode • Master-serial mode • Slave SelectMAP mode • Master SelectMAP mode • Boundary-Scan mode (IEEE 1532)
Architecture Overview (contd.) Readback and Integrated Logic Analyzer: • Configuration data stored in device configuration memory can be read back for verification (using Xilinx ChipScope Integrated Logic Analyzer cores and Integrated Bus Analyzer cores with the corresponding software).
Configurable Logic Blocks • A conceptual model of the FPGA is shown in the fig†. with its 3 basic elements namely the IOBs, Programmable Interconnects and most importantly the CLBs. • Although FPGAs depend more heavily on interconnects than CPLDs, the logic functions are mainly realized by the CLBs. †http://www.coe.montana.edu/ee/courses/ee/ee367/pdffiles/truegamer.pdf
(contd.) • CLB resources consist of four slices and two 3-state buffers. • Each slice is equivalent and contains: • Two function generators (F & G) • Two storage elements • Arithmetic logic gates • Large multiplexers • Wide function capability • Fast carry look-ahead chain • Horizontal cascade chain (OR gate) • The function generators F & G are configurable as 4-input look-up tables (LUTs), as 16-bit shift registers, or as 16-bit distributed SelectRAM+ memory.
Configurable Logic Blocks (contd.) • The two storage elements are either edge-triggered D-type flip-flops or level-sensitive latches. • Each CLB has internal fast interconnect and connects to a switch matrix to access general routing resources. • Each slice has a dedicated OR gate for implementing a Sum of Products chain.
Configurable Logic Blocks (contd.) Fig.: A Virtex II Pro Slice (Top Half)
IP Cores • The Virtex II Pro FPGA supports several hardware and software Intellectual Property cores. Some of the prominent ones are: • Hardware Cores: • Bus Infrastructure cores (arbiters, bridges etc.) • Memory cores (like DDR, Flash) • Peripheral cores (UART, IIC) • Networking cores (e.g. ATM, Ethernet) • Software Cores: • Boot code • Test code • Device drivers • Protocol stacks • RTOS integration • Customized board support package
IP Cores (contd.) • The IP cores are modular, portable, RTOS independent, and CoreConnect compatible hence allowing design migration.
18bit * 18 bit Multipliers • The device has several 18-bit by 18-bit 2’s complement signed, dedicated embedded multiplier blocks. • The multipliers can be associated with an 18 Kb block SelectRAM+ resource or can be used independently. • They are optimized for high-speed operations and have a lower power consumption compared to the 18-bit x 18-bit multiplier in the CLB slices. • Each memory and multiplier block is tied to four switch matrices • The SelectRAM+ memory can be used only up to 18 bits wide when the multiplier is used, because the multiplier shares inputs with the upper data bits of its memory.
18bit * 18 bit Multipliers (contd.) • Both A and B are 18-bit-wide inputs, and the output is 36 bits. Fig.: The Memory and multiplier block connections Fig.: A multiplier block
Device Configuration • The FPGA is configured by loading application specific configuration data into the internal configuration memory. • Configuration is carried out using a subset of the total pins, some of which are dedicated, while others can be re-used as general purpose I/Os once configuration is complete. • There are 3 mode pins M0, M1 and M2 which select out of five configuration modes. • The mode pins should be either pulled-up or down through resistors, or tied directly to ground or VCCAUX (2.5V) to prevent varying supply during configuration, and should not be toggled during and after configuration.
Configuration Modes • Virtex-II Pro has five configuration modes: • Slave-Serial Mode • Master-Serial Mode • Slave SelectMAP Mode • Master SelectMAP Mode • Boundary-Scan (JTAG, IEEE 1532) Mode
Configuration Modes (contd.) Slave-Serial Mode: • In slave-serial mode, the FPGA receives configuration data in bit-serial form from a serial PROM or other serial source of configuration data. • The CCLK pin on the FPGA is an input in this mode. • CCLK is externally generated • Multiple FPGAs can be daisy-chained for configuration from a single source. Master-Serial Mode: • In this mode, it is the Virtex-II Pro FPGA device that drives the configuration clock (CCLK) • The interface is identical to slave serial except that an internal • Oscillator is used to generate the configuration clock. • A wide range of frequencies can be selected for CCLK which always starts at a slow default frequency.
Configuration Modes (contd.) Slave SelectMAP Mode: • This is the fastest configuration option. • Byte-wide data is written into the device with a BUSY flag controlling the flow of data. An external data source provides a byte stream, CCLK, an active Low • Chip Select (CS_B) signal and a Write signal (RDWR_B). If BUSY is asserted (High) by the FPGA, the data must be held until it goes Low. • Data can also be read using the SelectMAP mode. If RDWR_B is asserted, configuration data is read out of the FPGA as part of a readback operation. Master SelectMAP Mode: • This is a master version of the previous mode. • The device is configured byte-wide on a CCLK supplied internally by the FPGA. Timing is similar to the Slave SerialMAP mode except that CCLK is supplied by FPGA.
Configuration Modes (contd.) Boundary-Scan (JTAG, IEEE 1532) Mode: • In boundary-scan mode, dedicated pins are used for configuring the device. The configuration is done entirely through the IEEE 1149.1 Test Access Port (TAP). • Configuration through the boundary-scan port is always available, independent of the mode selection.
Readback • In this mode, configuration data from the Virtex-II Pro FPGA device can be read back. • Readback is supported only in theSelectMAP (master and slave) and Boundary Scan mode. • Along with the configuration data, it is possible to read back the contents of all registers, distributed SelectRAM+, and block RAM resources and is useful for debugging purposes.
Partial Reconfiguration • Partial reconfiguration can be accomplished in either Slave SelectMAP mode or Boundary-Scan mode. • , New data is loaded into a specified area of the chip, while the rest of the chip remains in operation. • Chip is not required to be reset as in a full configuration • Data is loaded on a column basis, with the smallest load unit being a configuration “frame” of the bitstream (device size dependent). • Partial reconfiguration is useful for applications that require different designs to be loaded into the same area of a chip, or that require the ability to change portions of a design without having to reset or reconfigure the entire chip (dynamic reconfiguration).