1 / 54

Spartan Architecture: Simulation vs Synthesis | FPGA Design Guide

Explore the Spartan architecture for FPGA design, covering top-level attributes, CLB slices, I/O blocks, clock management, power modes, and memory options like RAM and shift registers. Dive into VHDL synthesis, CLB design, and clock distribution for optimal hardware implementation.

jhobart
Download Presentation

Spartan Architecture: Simulation vs Synthesis | FPGA Design Guide

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spartan architecture Simulation versus synthesis differences Think about hardware Use of subprogram Synthesis process Chapter 8 Writing VHDL for Synthesis EE514

  2. Spartan Architecture • Features • Top-level architecture and attributes • Configurable logic block (CLB) • Memory • I/O block • Three-state buses • Clocks and delay-locked loops • Power down mode • Configuration

  3. Spartan Top-level Architecture • Configurable logic blocks • Implement logic here! • I/O blocks • Communicate with other chips • Choose from 16 signal standards • Block RAM • On-chip memory for higher performance EE514

  4. Spartan Top-level Architecture • Clocks and delay locked loops • Synchronize to clock on and off chip • Rich interconnect resources • Three-state internal buses • Power down mode • Lower quiescent power EE514

  5. CLB Slice (Simplified) • 1 CLB holds 2 slices which • have two sets of the following: • Four-input LUT • Any 4-input logic function • Or 16-bit x 1 RAM • Or 16-bit shift register • Carry & control • Fast arithmetic logic • Multiplexer logic • Storage element • Latch or flip-flop • Set and reset • True or inverted inputs • Sync. or async. control EE514

  6. CLB Slice MUXF6 MUXF5 Slice MUXF5 LUT LUT LUT LUT Dedicated Expansion Multiplexers • MUXF5 combines 2 LUTs to form • 4x1 multiplexer • Or any 5-input function • MUXF6 combines 2 slices to form • 8x1 multiplexer • Or any 6-input function EE514

  7. Dedicated 16x16 Multiplier Logic • Highly efficient ‘shift & add’ implementation EE514

  8. LUT Q Q Q Q IN D D D D CE CE CE CE CE CLK OUT CLB Slice Slice LUT LUT LUT LUT ADDR[3:0] Look-up Table Shift Registers • Each LUT can be configured as shift register • Serial in, serial out • Dynamically addressable delay up to 16 cycles 0 1 2 15 EE514

  9. LUT Q Q Q Q IN D D D D CE CE CE CE CE CLK OUT CLB Slice Slice LUT LUT LUT LUT ADDR[3:0] Flexible Cycle Delays • Use for programmable clock delay • Cascade for greater cycle delays • Use CLB flip-flops to add depth EE514

  10. SDRAM ZBTRAM SSRAM SGRAM DSP Coefficients Small FIFOs Shallow/Wide 16x1 Distributed RAM External RAM megabytes bytes Memory Bandwidth and Flexibility • Spartan on-chip SelectRAM+TM memory Large FIFOs Packet Buffers Video Line Buffers Cache Tag Memory Deep/Wide 4Kx1 2Kx2 1Kx4 512x8 256x16 Block RAM kilobytes 200 MHz Memory Continuum Highest performance FPGA memory system EE514

  11. W R Port A Spartan-II Dual-R/WPort Block RAM Port B R W W W R R Block RAM Provides 4K Bits Each • Dual read/write ports, each with: • Independent clock, R/W, and enable • Independently configurable data width from 4Kx1 to 256x16 Data Flow Spartan-II A to B Yes B to A Yes A to A Yes B to B Yes EE514

  12. Block RAM Timing • Clock-to-output (glitch-free): 2.5 ns typ. • Address/data input setup: 1.0 ns typ. • Lookup table based RAM provides additional small memories (16x1) • Same timing as CLB logic • Both easily initialized at configuration to simulate ROM EE514

  13. I/O Block (Simplified) • Registered input, output, 3-state control

  14. IOBs Organized As Independent Banks • Up to eight banks on a device • Each bank can support any of the 16 signal standards • Logic level translation • Boards with mixed standards

  15. SSTL HSTL SDRAM LVTTL LVCMOS CTT SRAM GTL+ Spartan-II As Center forSignal Translation • Chip to Chip • LVTTL, LVCMOS • Chip to Memory • SSTL2-I, SSTL2-II, SSTL3-I, • SSTL3-II, HSTL-I, HSTL-III, • HSTL-IV, CTT • Chip to Backplane • PCI33-5V, PCI33-3.3V, • GTL, GTL+, AGP EE514

  16. Vector Based Interconnect 2ns 2ns 2ns 2ns CLB Array High Performance Routing • Hierarchical routing • Sparse longer connections for high speed • Routing delay depends primarily on distance • Predictable for early design analysis EE514

  17. General Clock Support • Four dedicated global low skew buffers • Dedicated input pin (clock distribution only) • PCI with 500-ps maximum skew • Additional shared resources (e.g., long lines) • Distribute low-skew/high-fanout signals • Four delay-locked loops on each device • Two global buffers with each DLL pair EE514

  18. Delay-locked Loop Functions • Eliminate clock distribution delay • System synchronization • Phase-shifted clocks • Clock multiplication and division • Clean up clocks with 50/50 duty cycle correction • Clock lock for synchronization • DLL feedback connected internally or externally • Can synchronize to DLL lock EE514

  19. Improved Clock-to-out Using DLL • Clock-to-out delays reduced over 50% Output standard = LVTTL Fast 16mA Temp=room, Vdd=2.5V, Vcco=3.3V Waveforms: 1: CLKIN 2: DATA OUT (no DLL) 3: DATA OUT (DLL deskewed) Timing w/o DLL w/ DLL r->r r->f r->r r->f 3.6n 3.5n 1.4n 1.4n EE514

  20. Spartan-II DLLs ImproveClock Networks Deskew Clocks on Board DLL1 DLL2 Deskew Clocks on Chip Cascade DLLs Manage up to 4 System Clocks Convert Clock Levels using Select I/O DLL3 DLL4 • Generate • Clocks • multiply • divide • shift Delay locked loops synchronize on-chip and board level clocks EE514

  21. Power-down Mode • Controlled by single power down pin • All inputs blocked, appear low internally • All outputs disabled • All register states preserved • Power-down status pin • Synchronous wake up • 100 uA typical EE514

  22. Configuration Modes There are four ways to program a Spartan FPGA EE514

  23. Partial Reconfiguration • Frame by frame reconfiguration supported while device is running • Routing changes affect device operation • Re-initializing a block RAM requires stopping all access in that column • Can dynamically load the required logic at any time • Minimizes cost by time-multiplexing the logic resources EE514

  24. Spartan-II Family Overview EE514

  25. EE514

  26. EE514

  27. EE514

  28. EE514

  29. EE514

  30. MIPS • >200 MicroBlaze Processors fit in Virtex V5 • Effective cost per MicroBlazeprocessor $1.4 EE514

  31. EE514

  32. EE514

  33. Simulation versus synthesis differences The process sensitivity list has a major impact on the preceding simulation of the corresponding process statement. The synthesis tools usually ignore the process sensitivity list and are able to generate corresponding hardware. latch : process (LATCH_EN, AD) begin if (LATCH_EN = '1') then ADDR_LATCH <= AD; end if; end process; EE514

  34. Think about hardware architecture RTL2 of LE64 is begin process (data1, data2) variable x, y, z : unsigned (64 downto 0); begin x := '0' & unsigned(DATA1); y := '0' & unsigned(DATA2); z := y - x; if (z(64) = '0') then LE <= '1'; else LE <= '0'; end if; end process; end RTL2; library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_arith.all; entity LE64 is port ( DATA1 : in std_logic_vector(63 downto 0); DATA2 : in std_logic_vector(63 downto 0); LE : out std_logic); end LE64; architecture RTL1 of LE64 is begin LE <= '1' when (data1 <= data2) else '0'; end RTL1; EE514

  35. Think about hardware Both architectures build the same function, however the second one will synthesize with less hardware since no comparator is inferred RTL1 has delay 31 ns (FPGA synthesized 298 ns) RTL2 has delay 15 ns (FPGA synthesized 124 ns) architecture RTL1 of LE64 is begin LE <= '1' when (data1 <= data2) else '0'; end RTL1; EE514

  36. Think about hardware EE514

  37. Think about hardware architecture RTL2 of LE64 is begin process (data1, data2) variable x, y, z : unsigned (64 downto 0); begin x := '0' & unsigned(DATA1); y := '0' & unsigned(DATA2); z := y - x; if (z(64) = '0') then LE <= '1'; else LE <= '0'; end if; end process; end RTL2; EE514

  38. Think about hardware EE514

  39. Think about hardware Use concurrent signal assignments if possible library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_arith.all; entity RESOURCE is port( RSTn, CLOCK : in std_logic; SEL : in std_logic_vector(1 downto 0); A, B, C, D : in std_logic_vector(15 downto 0); DOUT : out std_logic_vector(15 downto 0)); end RESOURCE; EE514

  40. Think about hardware RTL1 has only concurrent statements so the hardware can be optimized architecture RTL of RESOURCE is signal A1, B1, C1, D1 : std_logic_vector(15 downto 0); begin A1 <= unsigned(A) - 1; B1 <= unsigned(B) - 1; C1 <= unsigned(C) - 1; D1 <= unsigned(D) - 1; seq : process (RSTn, CLOCK) begin if (RSTn = '0') then DOUT <= (others => '0'); elsif (CLOCK'event and CLOCK = '1') then case SEL is when "00" => DOUT <= A1; when "01" => DOUT <= B1; when "10" => DOUT <= C1; when others => DOUT <= D1; end case; end if; end process; end RTL; architecture RTL1 of RESOURCE is begin seq : process (RSTn, CLOCK) begin if (RSTn = '0') then DOUT <= (others => '0'); elsif (CLOCK'event and CLOCK = '1') then case SEL is when "00" => DOUT <= unsigned(A) - 1; when "01" => DOUT <= unsigned(B) - 1; when "10" => DOUT <= unsigned(C) - 1; when others => DOUT <= unsigned(D) - 1; end case; end if; end process; end RTL1; EE514

  41. Synthesized schematic for RTL of resourcedelay 47 nsarea 81number of flip-flops 16 EE514

  42. Synthesized schematic for RTL1 of resource delay 57 ns area 65 number of flip-flops 16 EE514

  43. Use of subprogram Note: • Use functions and procedures to improve readability • They can be defined in a package to be shared. • From synthesis point of view, the functions and procedures would be lumped with other VHDL constructs for optimization. • Synthesis tool do not optimize beyond boundaries of an entity • Synthesis tools may restrict the use of signal attributes inside a subprogram. EE514

  44. Synthesis process EE514

  45. Synthesis process Input delay (hold time) specifiesthe duration of time for the input signals to be stable Output delay (setup time) specifies time when the output signals must be ready before the rising edge of the clock EE514

  46. Synthesis process Example design with delay constraints begin XOR8<= D7 xor D6 xor D5 xor D4 xor D3 xor D2 xor D1 xor D0; COMB_OUT <= XOR8; FF_OUT <= FF; FF_COMB_OUT <= FF xor D5 xor D4 xor D3 xor D2 xor D1 xor D0; p0 : process (RSTn, CLK) begin if (RSTn = '0') then FF <= '0'; elsif (CLK'event and CLK = '1') then FF <= XOR8; end if; end process; end RTL; library IEEE; use IEEE.std_logic_1164.all; entity CONSYN is port ( RSTn, CLK : in std_logic; D0, D1, D2, D3, D4, D5, D6, D7 : in std_logic; FF_OUT, COMB_OUT, FF_COMB_OUT : out std_logic); end CONSYN; architecture RTL of CONSYN is signal XOR8, FF : std_logic; EE514

  47. Synthesis process The following constraints are used to synthesize the circuit Read-f vhdl “consyn.vhd” -- reads source code file include compile.common -- command file which specifies -- wires, operating conditions, library -- next 5 lines specify timing constraints create_clock “CLK” -name clk -period 10.0 -waveform{0.0 5.0} set_clock_skew -uncertainty 1.0 clk set_input_delay 1.0 -add_delay -clock clk{D*} set_output_delay 3.0 -add_delay -clock clk{COMB_OUT} set_output_delay 3.0 -add_delay -clock clk{FF_COMB_OUT} set_dont_touch_network{CLK RSTn} -- clock and reset latched directly compile -map_effort medium -- start the synthesis EE514

  48. Synthesis process EE514

  49. Synthesis process If the same circuit is synthesized with different delay constraints the synthesis results in different architecture Set_input_delay 1.0 -add_delay -clock clk{D0} Set_input_delay 2.0 -add_delay -clock clk{D1} Set_input_delay 3.0 -add_delay -clock clk{D2} Set_input_delay 4.0 -add_delay -clock clk{D3} Set_input_delay 5.0 -add_delay -clock clk{D4} Set_input_delay 6.0 -add_delay -clock clk{D5 D6 D7} Set_input_delay 2.0 -add_delay -clock clk{COMB_OUT} Set_input_delay 3.0 -add_delay -clock clk{FF_COMB_OUT} EE514

  50. Synthesis process EE514

More Related