190 likes | 319 Views
ECE 232 Hardware Organization and Design Lecture 4 Performance, Design with VHDL. Maciej Ciesielski www.ecs.umass.edu/ece/labs/vlsicad/ece232/spr2002/index_232.html. Outline. Performance, evaluation Metrics: MIPS, CPI, execution time Amdahl’s law VHDL basics Combinational logic Examples
E N D
ECE 232Hardware Organization and DesignLecture 4Performance, Design with VHDL Maciej Ciesielski www.ecs.umass.edu/ece/labs/vlsicad/ece232/spr2002/index_232.html
Outline • Performance, evaluation • Metrics: MIPS, CPI, execution time • Amdahl’s law • VHDL basics • Combinational logic • Examples • Instruction formats, cont’d • Addressing classes, modes • Examples • MIPS assembly
Two notions of “performance” Plane NY to Paris Speed Passengers Throughput (p/mph) Boeing 747 6.5 hours 610 mph 470 286,700 Concodre 3 hours 1350 mph 132 178,200 Which has higher performance? ° Time to do the task (Execution Time) – execution time, response time, latency ° Tasks per day, hour, week, sec, ns. .. (Performance) – throughput, bandwidth Response time and throughput often are in opposition
Performance - Example • Performance: in units of things/time_unit • - bigger is better • Time of Concorde vs. Boeing 747? • Concord is 1350 mph / 610 mph = 2.2 times faster • (6.5 hours / 3 hours) • Throughput of Concorde vs. Boeing 747 ? • Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster” • Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster” • Boeing is 1.6 times (“60%”)faster in terms of throughput • Concord is 2.2 times (“120%”) faster in terms of flying time • We will focus primarily on execution time for a single job
Application Programming Language Compiler ISA Datapath Control Function Units Transistors Wires Pins Metrics of performance Answers per month Useful Operations per second (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s Megabytes per second Cycles per second (clock rate) Each metric has a place and a purpose, and each can be misused
CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Review: Aspects of CPU Performance Instr count CPI Clock rate Program X Compiler X X Instr. Set X X Organization X X Technology X
n CPU time = ClockCycleTime * CPIi * Instri i =1 MIPS, CPI • MIPS = #instructions per cycle (in millions) • MIPS = Instruction count / Execution time *106 • Invest Resources where time is Spent! • CPI = average # cycles per instruction CPI = Clock Cycles / Instruction Count = (CPU Time * Clock Rate) / Instruction Count cycles per intstruction class i n CPI = CPIi * Fi where Fi = Instri i = 1 Instruction Count "instruction frequency"
Evaluating Instruction Sets Design-timemetrics: ° Can it be implemented, in how long, at what cost? ° Can it be programmed? Ease of compilation? Static Metrics: ° How many bytes does the program occupy in memory? Dynamic Metrics: ° How many instructions are executed? ° How many bytes does the processor fetch to execute the program? ° How many clocks are required per instruction? ° How "lean" a clock is practical? Best Metric: Time to execute the program! NOTE: this depends on instructions set, processor organization, and compilation techniques.
Example (RISC processor) Base Machine (Reg / Reg) Op Freq Cycles CPI(i) % Time ALU 50% 1 .5 23% Load 20% 5 1.0 45% Store 10% 3 .3 14% Branch 20% 2 .4 18% 2.2 Typical Mix • How much faster would the machine be if a better data cache • reduced the average load time to 2 cycles? • How does this compare with using branch prediction to shave a • cycle off the branch time? • What if two ALU instructions could be executed at once?
F F 1 1/S Amdahl's Law Speedup due to enhancement E: Exec_time w/oEPerformance with E Speedup(E) = ----------------------- = --------------------------- Exec_time withEPerformance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected. Then: Exec_time(with E) = (F/S + (1-F) ) X Exec_time(w/o E) Speedup(with E) = 1 (1-F) + F/S
top down bottom up Review: Summary of the Design Process • Hierarchical Design to manage complexity • Top Down vs. Bottom Up vs. Successive Refinement • Importance of Design Representations: • Block Diagrams • Decomposition into Bit Slices • Truth Tables, K-Maps • Circuit Diagrams • Other Descriptions: • state diagrams • timing diagrams • register transfer, . . . • Optimization Criteria: • Gate Count • [Package Count] Logic Levels Fan-in/Fan-out Area Delay Power Cost Design time Pin Out
Hardware Representation Languages Block Diagrams: FUs, Registers, & Dataflows Register Transfer Diagrams: Choice of busses to connect FUs, Regs Flowcharts State Diagrams Hardware Description Languages Verilog HDL VHDL Descriptions in these languages can be used as input to simulation systems synthesis systems Two different ways to describe sequencing & microoperations HW modules described like programs with i/o ports, internal state, & parallel execution of assignment statements "software breadboard" generate hw from high level description "To Design is to Represent"
(VHSIC Hardware Description Language) VHDL • Goals: • Support design, documentation, and simulation of hardware • Digital system level to gate level • “Technology Insertion” • Concepts: • Design entity • Time-based execution model. Interface = External Characteristics Design Entity = Hardware Component Architecture(Body )= Internal Behavior or Structure
a y nand b VHDL Example: nand Gate ENTITY nand is PORT (a,b: IN BIT; y: OUT BIT); END nand; ARCHITECTURE behavioral OF nand is BEGIN y <= a NAND b; END behavioral; • Entity describes interface • Architecture give behavior (function) • y is a signal, not a variable • it changes whenever the inputs change • NAND process is in an infinite loop • Bit is 0, 1. Can also use STD_LOGIC (0,1, Z,X) names (given)
Modeling Delays • Model temporal, as well as functional behavior, with delays in signal statements. Time is one difference from programming languages • Output y changes 1 ns after a or b changes • Delay statements not supported by synthesis tools (non-synthesizable) ENTITY nand is PORT (a,b: IN BIT; y: OUT BIT); END nand; ARCHITECTURE behavioral OF nand is BEGIN y <= a NAND b after 1 ns; END behavioral;
a [31:0] Y[31:0] nand32 b [31:0] Bit-vector Operators • Can be converted to a 32 bit integer ENTITY nand32 is PORT (a,b: IN STD_LOGIC_VECTOR ( 31 downto 0); y: OUT STD_LOGIC_VECTOR ( 31 downto 0); END nand32; ARCHITECTURE behavioral OF nand32 is BEGIN y <= a NAND b; END behavioral; STD_LOGIC_VECTOR
a 0 y mux2to1 b 1 sel Simple Operators LIBRARY ieee; USE ieee.std_logic_1164.all; ENTITY mux2to1 IS PORT (a, b, sel: IN STD_LOGIC; y: OUT STD_LOGIC; END mux2to1; ARCHITECTURE logic OF mux2to1 IS BEGIN WITH sel SELECT y <= a WHEN ‘0’ ; b WHEN OTHERS; END logic ; • Must use “others”, since sel={0,1,Z,X} (std_logic) You can also use other constructs: IF … THEN WHEN, etc.
Arithmetic Operations ENTITY add32 is PORT (a,b: IN STD_LOGIC_VECTOR ( 31 downto 0); y: OUT STD_LOGIC_VECTOR ( 31 downto 0); END add32; ARCHITECTURE behavioral OF add32 is BEGIN y <= addum(a, b) ; END behavioral; • “addum” adds two n-bit vectors to produce an n+1 bit vector • Alternatively, you can declare a,b,y as INTEGERS, and use y <= a+b.
Control Constructs ENTITY mux32 is PORT(A, B: In STD_LOGIC_VECTOR (31 downto 0); DOUT: STD_LOGIC_VECTOR (31 downto 0); SEL: in BIT); End mux32; ARCHITECTURE behavior Of mux32 Is begin mux32_process: process(A, B, SEL) begin if (SEL= 0) then DOUT <= A; else DOUT <= B; end if; end process; end behavior; • Process fires whenever its “sensitivity list” changes • Evaluates the body sequentially • VHDL provide case statements as well