650 likes | 754 Views
ECE 656M Embedded Systems Design And Prototyping Term 3, 2011-2012. Cesar A. Llorente. Research and teaching interests: reconfigurable computing machine vision e nergy systems Contact: Electronics and Communications Engineering College of Engineering
E N D
ECE 656M Embedded Systems Design And Prototyping Term 3, 2011-2012
Cesar A. Llorente Research and teaching interests: • reconfigurable computing • machine vision • energy systems Contact: Electronics and Communications Engineering College of Engineering Contact: cesar.llorente@dlsu.edu.ph
ECE 545 Projects Lecture Homework 10 % Project 1 30 % Project 220 % exams Quiz 20 % in class Final20 % take home
Lecture (1) Lecture 1 - Introduction to Embedded Systems Lecture 2 – Introduction to VHDL Combinational Logic. Packages and Components. Hands-on Session 1: XST Synthesis and Simulation Lecture 3 – Behavioral Modeling of Sequential Logic. Registers, Counters, Shift Registers.Simple Testbenches. Lecture 4 - Introduction to FPGA Devices & Tools Hands-on Session 2: Tools for FPGA Synthesis and Implemenation Lecture 5 - Finite State Machines Lecture 6 - Algorithmic State Machines. Memories: RAM, ROM. Lecture 7 – Advanced Testbenches. File I/O. Lecture 8 - Mixed Style RTL Modeling Quiz 1
Textbooks Required Textbooks: Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004 Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998 Supplementary Textbooks: Stephen Brown and Zvonko Vranesic, Fundamentals of Digital Logic with VHDL Design, 2nd Edition, McGraw-Hill, 2005 Peter J. Ashenden, The Designer's Guide to VHDL, 2nd Edition, San Francisco:Morgan Kaufman, 1996, 2002
Quiz • 2 hours 30 minutes • in class • design-oriented • open-books, open-notes Tentative date:
Final Examination • take-home • full design, including logic synthesis and timing analysis • for FPGAs Tentative date:
Project technologies FPGA: Field Programmable Gate Arrays
World of Integrated Circuits Integrated Circuits Full-Custom ASICs Semi-Custom ASICs User Programmable PLD FPGA PAL PLA PML LUT (Look-Up Table) MUX Gates
Two competing implementation approaches FPGA FieldProgrammable GateArray ASIC ApplicationSpecific IntegratedCircuit • bought off the shelf and reconfigured by designers themselves • designs must be sent for expensive and time consuming fabrication in semiconductor foundry • no physical layout design; design ends with a bitstream used to configure a device • designed all the way from behavioral description to physical layout
Which Way to Go? ASICs FPGAs Off-the-shelf High performance Low development cost Low power Short time to market Low cost in high volumes Reconfigurability
I/O Block I/O Block I/O Block I/O Block What is an FPGA Chip ? • Field Programmable Gate Array • A chip that can be configured by user to implement different digital hardware • Configurable Logic Blocks and Programmable Switch Matrices • Bitstream to configure: function of each block & the interconnection between logic blocks Source: [Brown99]
CLB Slice COUT YB Carry & Control Logic Look-Up Table Y G4 G3 G2 G1 S D Q O CK EC R F5IN BY SR XB Look-Up Table Carry & Control Logic X S F4 F3 F2 F1 D Q O CK EC R CIN CLK CE SLICE
LUT (Look-Up Table) Functionality • Look-Up tables are primary elements for logic implementation • Each LUT can implement any function of 4 inputs
Major FPGA Vendors SRAM-based FPGAs • Xilinx, Inc. • Altera Corp. • Atmel • Lattice Semiconductor Flash & antifuse FPGAs • Actel Corp. • Quick Logic Corp. Share over 60% of the market
Xilinx FPGA Families • Old families • XC3000, XC4000, XC5200 old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. • Low-cost families • Spartan/XL – derived from XC4000 • Spartan-II – derived from Virtex • Spartan-IIE – derived from Virtex-E • Spartan-3 • High-performance families • Virtex (0.22µm) • Virtex-E, Virtex-EM (0.18µm) • Virtex-II, Virtex-II PRO (0.13µm) • Virtex-4 (0.09µm)
Design process (1) Specification Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds….. VHDL description (Your VHDL Source Files) Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31downto0); data_output: out std_logic_vector(31downto0); out_full: in std_logic; key_input: in std_logic_vector(31downto0); key_read: out std_logic; ); end AES_core; Functional simulation Synthesis Post-synthesis simulation
Design process (2) Implementation (Mapping, Placing & Routing) Timing simulation Configuration On chip testing
Simulation Tools Many others…
Logic Synthesis VHDL description Circuit netlist architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW;
Synthesis Tools … and others
Features of synthesis tools • Interpret RTL code • Produce synthesized circuit netlist in a standard EDIF format • Give preliminary performance estimates • Some can display circuit schematics corresponding to EDIF netlist
Implementation • After synthesis the entireimplementation process is performed by FPGA vendor tools
Mapping LUT0 LUT4 LUT1 FF1 LUT5 LUT2 FF2 LUT3
Placing FPGA CLB SLICES
Routing FPGA Programmable Connections
Top Level ASIC Digital Design Flow Design Inception RTL Design Synthesis Macro Development Place + Route Physical Verification Design Complete
RTL Design Design Function Digital Tool Design Inception Design Inception Cadence NC Verilog RTL Design Mentor Graphis ModelSim Lint Checking Cadence Hal ( users discression) FPGA Verification Xilinx ISE ( users discression) Code Coverage Cadence ICT ( users discression) Cadence NC Verilog Testbench Developement Mentor Graphics ModelSim Mixed Mode Simulation Cadence AMS Designer Formal Verification Cadence Conformal Agilent ADS System Interface Simulation Matlab Synthesis Synthesis + Macro Synthesis + Macro Development Development
Synthesis + Macro Development Design Function Digital Tool RTL RTL Synopsys DC Synthesis Macro Generation Artisan Cadence RC Synopsys DFT Compiler DFT Macro Verification Mentor Graphics Calibre Cadence RC Artisan / Macro Rules Generation / Synopsys PrimeTime Static Timing Analysis Library Generation Cadence DFII Cadence Conformal Logical Equivalency Verification Verification Cadence NC Verilog Gate - Level Simulation Mentor Graphics Modelsim Place + Route Place + Route
Place + Route Digital Tool Design Function Synthesis Synthesis Floorplan Macro Placement / Std Cell Placement Cadence Encounter Placement - Based Optimization Clock Tree Synthesis Static Synopsys Timing Prime - Analysis Time Route Cadence NanoRoute Spare Cells / Decoupling Mentor Graphics ATPG Cap Filler Cells Cadence Encounter FastScan RC Extraction Cadence Fire & Ice QX Signal Integrity Cadence CeltIC / Voltage Storm Metal Fill Cadence Encounter Verification Verification
Physical Verification Digital Tool Design Function Placed + Routed Placed + Routed Design Design GDSII Preparation / Simulation Preparation Cadence DFII Cadence DFII Schematic Preparation Back Annotated Simulation Layout Chip Finishing Cadence Virtuoso Cadence NC Verilog DRC LVS Mentor Graphics Calibre ERC Synopsys Nanosim Top - Level Simulation Cadence AMS Designer Design Complete Design Complete
CAD software available at DLSU (1) VHDL simulators • Xilinx ISE 12.3 (under Windows) • available in the STRC111 Intel Microprocessors Lab Free Student Edition: ISE WebPack • VCS (under Linux) • available in the STRC111 Intel Microprocessors Lab
CAD software available at DLSU (2) Tools used for logic synthesis FPGA synthesis • Xilinx XST / EDK /SDK(under Windows) • available in the STRC111 Intel Microprocessors Lab
CAD software available at DLSU (3) Tools used for implementation (mapping, placing & routing) in the FPGA technology FPGA synthesis • Xilinx XST(under Windows) • available in the STRC111 Intel Microprocessors Lab
Projects – Overview Project 1 (35 points) January – February (~6 weeks) • Application: Game Application using Microblaze Processor • Technology:FPGA Target:synthesizable code, downloadable code Project 2 (35 points) March (~4 weeks ) • Application: Game Software using state machines Technology:FPGA Target:synthesizable code, downloadable code
Projects 1, 2 • choice between two project topics • cryptography (e.g., encryption, authentication, hash) • digital signal processing (e.g., digital filter, FFT, image processing, etc.) • both topics specified by the instructor • initial specification in the form of a - pseudocode and/or flowchart - detailed interface • design and source code is required to be scalable, i.e., work for different parameters and operand sizes, specified at the time of synthesis
Example: Last year’sproject – RC6 cipher Decryption Encryption Input: (A, B, C, D) Table S[0..2r+3] C = C– S[2r+3] A = A– S[2r+2] for i= rdownto 1 do { (A, B, C, D) = (D, A, B, C) u= (D*(2D+1)) <<< log2w t= (B*(2B+1)) <<< log2w C= ((C –S[2i+1]) >>> t)u A= ((A –S[2i]) >>> u)t } D = D – S[1] B = B – S[0] Output (A, B, C, D) Input: (A, B, C, D) Table S[0..2r+3] B = B + S[0] D = D + S[1] for i= 1 to r do { t= (B*(2B+1)) <<< log2w u= (D*(2D+1)) <<< log2w A= ((At) <<< u) + S[2i] C= ((Cu) <<< t) + S[2i+1] (A, B, C, D) = (B, C, D, A) } A = A + S[2r+2] C = C + S[2r+3] Output: (A, B, C, D)
Required interface clock Encryption/decryption unit with control & i/o interface reset enc_dec m data_out m data_in data_available write full data_read round number round key(s) w S_i Key memory unit key_available key_read ready
Projects 1, 2Optimization Criteria Maximum ratio Throughput / Circuit Area or Minimum product Latency Circuit Area
Primary timing parameters Throughput Latency Xi+2 Xi Xi+1 Xi Time to process a single block of data Circuit Circuit Number of bits processed in a unit of time Yi+2 Yi Yi+1 Yi Block_size · Number_of_blocks_processed_simultaneously Throughput = Latency
Infinite Impulse Response (IIR) Filter Equations (1) Transfer function
Two investigated architectures Architecture 1: Direct II Form
Architecture 2: Cascade of second-order systems (b) Fi(z)