190 likes | 349 Views
ECE 448: Spring 11 Lab 3 Sequential Logic for Synthesis FPGA Design Flow Based on Aldec Active-HDL. Agenda for today. Part 1: Introduction to the new Lab Assignment: Square Root Unit based on CORDIC Part 2: FPGA Design Flow based on Aldec Active-HDL
E N D
ECE 448: Spring 11 Lab 3 Sequential Logic for Synthesis FPGA Design Flow Based on Aldec Active-HDL
Agenda for today Part 1: Introduction to the new Lab Assignment: Square Root Unit based on CORDIC Part 2: FPGA Design Flow based on Aldec Active-HDL - using Xilinx XST - using Synplify Premier DP Part 3: Demos of Lab 2
Part 1 Introduction to the new Lab Assignment Square Root Unit based on CORDIC
CORDIC Algorithms - Motivation • Operations such as trigonometric functions, division, and logarithms are not synthesizable. • Some alternative methods • Lookup tables • Can require large amounts of memory. • Taylor/Maclaurin series • Requires multipliers • CORDIC algorithms • Small area = Inexpensive in hardware • High latency
CORDIC Algorithm for Square Root • Calculates • Pseudocode y = 0 for i=N/2-1 downto 0 do temp = (y + 2i)2 if temp ≤ x then y = y + 2i end if end for sqrt_x = y • (y + 2i)2 = y2 + (2i+1)y + 22i
Modified Pseudocode y = 0 y_sq = 0 for i=N/2-1 downto 0 do temp = y_sq + (2i+1)y + 22i if temp ≤ x then y = y + 2i y_sq = temp end if end for sqrt_x = y • All computations performed using only addition, bit shifts, and comparisons.
Example • N = 8, x = 26 • i = 3, temp = 0 + 2(0)(8) + 82 = 64, y = 0, y_sq = 0 • i = 2, temp = 0 + 2(0)(4) + 42 = 16, y = 4, y_sq = 16 • i = 1, temp = 16 + 2(4)(2)+ 22 = 36, y =4, y_sq = 16 • i = 0, temp = 16 + 2(4)(1) + 12 = 25, y = 5, y_sq = 25 • Done! sqrt_x = 5
Block Diagram “1000…..000” D Q N/2 out_valid x load Shift Reg. in_valid Q Q en en D D sqrt_x ld_en N/2 Q(0) ‘0’ s_in rst rst Q N/2 2i A + A << B A ≥ B A y N/2 N B B (2i+1)y i+1 ld_en Down counter N Q N/2 -1 load Q L-1 (2i )(2i)=22i N A A << B L-1 + i B temp +1 L y_sq N L = ceil(log2(N)) N
Bonus • Make output with M variable. • Allows greater output precision • Output is of form:
Bonus Pseudocode y = 0 y_sq = 0 x_shifted = x << (2M – N) for i=M-1 downto 0 do temp = y_sq + (2i+1)y + 22i if temp ≤ x_shifted then y = y + 2i y_sq = temp end if end for sqrt_x = y
Example • N = 8, M = 6, x = 42 • x_shifted = 42 << (2(6) – 8) = 42 << 4 = 672. • i = 5, temp = 0 + 2(0)(32) + 322 = 1024, y = 0, y_sq = 0 • i = 4, temp = 0 + 2(0)(16) + 162 = 256, y = 16, y_sq = 256 • i = 3, temp = 256 + 2(16)(8) + 82 = 576, y = 24, y_sq = 576 • i = 2, temp = 576 + 2(24)(4) + 42 = 784, y = 24, y_sq = 576 • i = 1, temp = 576 + 2(24)(2) + 22 = 676, y = 24, y_sq = 576 • i = 0, temp = 576 + 2(24)(1) + 12 = 625, y = 25, y_sq = 625 • Done! x_sqrt = 25. • 25/22 = 6.25 • Check : sqrt(42) = 6.481
Bonus Diagram D Q “1000…..000” x N M << (2M-N) out_valid load Shift Reg. in_valid en en Q Q D D sqrt_x ld_en M Q(0) rst rst ‘0’ s_in Q M 2i A ≥ B A A + A << B y M 2M B (2i+1)y B i+1 ld_en Down counter 2M Q M -1 load Q L-1 2M (2i )(2i)=22i i A A << B L-1 + B temp +1 L y_sq 2M L = ceil(log2(2M)) 2M
Part 2 FPGA Design Flow based on Aldec Active-HDL
FPGA Design process (1) Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds….. Specification (Lab Assignments) On-paper hardware design (Block diagram & ASM chart) VHDL description (Your Source Files) Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31downto0); data_output: out std_logic_vector(31downto0); out_full: in std_logic; key_input: in std_logic_vector(31downto0); key_read: out std_logic; ); end AES_core; Functional simulation Synthesis Post-synthesis simulation
FPGA Design process (2) Implementation Timing simulation Configuration On chip testing
Synthesis Tools Synplify Premier DP Xilinx XST
Logic Synthesis VHDL description Circuit netlist architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW;
Implementation • After synthesis the entire implementation process is performed by FPGA vendor tools Xilinx ISE/WebPACK