320 likes | 505 Views
SPREE Tutorial. Peter Yiannacouras April 13, 2006. Processors on FPGAs. You all used FPGAs (ECE241) Adders 7-segment decoders Etc. We are putting whole microprocessors on them We call these soft processors. Soft Processor Written in HDL Programmed onto chip. Hard Processors
E N D
SPREE Tutorial Peter Yiannacouras April 13, 2006
Processors on FPGAs • You all used FPGAs (ECE241) • Adders • 7-segment decoders • Etc. • We are putting whole microprocessors on them • We call these soft processors
Soft Processor Written in HDL Programmed onto chip Hard Processors Made of transistors Costs millions to make Hard Versus Soft Processors Verilog Faster Smaller Less Power
We aim to improve soft processors by customizing them Processors and FPGA Systems • FPGAs are a common platform for digital systems UART Soft Processor Memory Interface Custom Logic Ethernet • Performs coordination and even computation • Better processors => less hardware to design
Our Research Problem • Soft processors have worse • Area • Speed • Power • But are • Flexible use to counteract HOW??? Customize the processor’s architecture ie. Intel vs AMD ie. Motorola 68360 vs 68010 HOW????
We developed SPREE, software to help us do both Research Goals • Understand tradeoffs in soft processors • Eg. A hardware multiplier is big but can perform multiplies fast • Customize it to the application • Eg. Bubble sort doesn’t use multiplies, therefore remove hardware multiplier and save on area
Processor Description ISA Datapath SPREE SPREE System(Soft Processor Rapid Exploration Environment) • Input: Processor description • SPREE System • Verify ISA against datapath • Datapath Instantiation • Control Generation Verilog • Output: Synthesizable Verilog
Verilog ISA currently fixed (subset of MIPS I) Input: Instruction Set Architecture (ISA) Description • Graph of Generic Operations (GENOPs) • Edges indicate flow of data • ISA • Datapath MIPS ADD – add rd, rs, rt FETCH SPREE RFREAD RFREAD ADD RFWRITE
Mul Ifetch Reg file Write Back ALU RTL Data Mem Input: Datapath Description • Interconnection of hand-coded components • Allows efficient synthesis • Described using C++ • ISA • Datapath Ifetch Reg File Ifetch Reg File SPREE Mul Data Mem Mul Shifter ALU Write Back ALU SPREE Component Library
Component Selection • Select by name • Names looked up in library • Stored in cpugen/rtl_lib RTLComponent *ifetch=new RTLComponent("ifetch"); RTLComponent *reg_file=new RTLComponent("reg_file");
rd rs rt offset Ifetch ALU opA result opB Datapath Wiring Example dst a_reg a_data b_reg b_data writedata Regfile proc.addConnection(ifetch,"rs",reg_file,"a_reg"); proc.addConnection(ifetch,"rt",reg_file,"b_reg");
SPREE System + Backend(Soft Processor Rapid Exploration Environment) SPREE generator (spegen) Processor Description Verilog Benchmarks Mint MIPS Simulator (simulator/run) Modelsim Verilog Simulator (spebenchmark) Quartus II CAD Software (specadflow) 4. Cycle Count 1. Area 2. Clock Frequency 3. Power Compare traces
Walking through an Example (see README.txt) • Choose a pre-built processor • cpugen/src/arch lists all the processors • Let’s choose pipe3_serialshift • 3-stage pipeline with serial shifter
Using SPREE on a Processor • Generate, benchmark, synthesize % spegen pipe3_serialshift % spebenchmark pipe3_serialshift % specadflow pipe3_serialshift % specompare pipe3_serialshift ← Generates Verilog ← Runs benchmarks ← Synthesizes processor ← Display results
spegen – Generating Processors • Input: Processor description • Syntax: spegen <processor name> • Output: • A folder named after the processor • Hand-coded Verilog modules • system.v • Generated hookup and control • OUT.cpugen • stages per instruction • Hazard window/branch penalty • test_bench.v • test bench for Modelsim simulation
Benchmarking • Run programs on the processor • Measure time taken till completion • Verify functionality • Can do this without knowing anything about the benchmarks themselves
spebenchmark – Benchmarking • Input: Processor implementation • Syntax: spebenchmark <processor> • Output: (ideally) • Cycle counts of all benchmarks • Traces: /tmp/modelsim_trace.txt ******* Benchmarking pipe3_serialshift ******** Simulating bubble_sort ... Success! Cycle count=2994 Simulating crc ... Success! Cycle count=112750 Simulating des ... Success! Cycle count=5129 Simulating fft ... Success! Cycle count=5077 Simulating fir ... Success! Cycle count=1214 ...
Verilog Benchmarking – under the hood C source benchmarks Compiler (gcc - MIPS) Binary Executable spebenchmark Mint MIPS Simulator (simulator/run) Modelsim Verilog Simulator (spebenchmark) Compare traces Trace Trace Cycle Count /tmp/modelsim_trace.txt applications/<benchmark name>/mint /tmp/modelsim_store_trace.txt
specompiler - Setup compiler • Choose the path to your compiler (prebuilt) • Default: /jayar/b/b0/yiannac/spe/compiler • GCC 3.3.3, software division • Another: /jayar/b/b0/yiannac/spe/compiler-softmul • GCC 3.3.3, software division and software multiplication • specompiler will: • Compile all benchmarks (and store binaries) • Simulate all benchmarks (and store traces) % specompiler /jayar/b/b0/yiannac/spe/compiler-softmul After this point, you can just run spebenchmark
spebenchmark - failure • Shows discrepancy between MINT and Modelsim ******* Benchmarking pipe3_serialshift ******** Simulating bubble_sort ... Error: Trace does not match, Cycle count=381 Discrepancy found at 6800000 ps Modelsim: PC=04000064 | IR=24090001 | 05: 00000000 Mint: PC=040000b8 | IR=8c47004c | 07: 00000064 value being written Clues to where the error occurred destination register
spebenchmark - waveforms • Can see any signal within the processor % sim_gui bubble_sort pipe3_serialshift
Modelsim • LEARN IT!!! • Quartus Simulator is vastly inferior, and even unusable for our purposes
The Testbench (test_bench.v) • What is it? • The stimulus and monitor for your circuit • SPREE automatically generates • And hence it works right away • Handcoding your own processor means • You have to interface with the test bench • Once you have the testbench you can use spebenchmark
Manual Interfacing with the Testbench • Need only 6 wires • To track writes to register file and data mem test_bench.v regfile_we regfile_dst regfile_data datamem_we datamem_addr datamem_data Your soft processor
SPREE System + Backend(Soft Processor Rapid Exploration Environment) SPREE generator (spegen) Processor Description Verilog Benchmarks Mint MIPS Simulator (simulator/run) Modelsim Verilog Simulator (spebenchmark) Quartus II CAD Software (specadflow) 4. Cycle Count 1. Area 2. Clock Frequency 3. Power Compare traces
specadflow – Synthesis • Input: Processor implementation • Syntax: specadflow <processor name> • Performs a “seed sweep” • Average several runs since results are noisy • Run several instances of quartus • Across several machines in parallel
specadflow Output • Output: • Synthesis results (hidden) • Summary output Started Tue 6:27PM, Waiting for processes: 10.0.0.61 10.0.0.57 10.0.0.56 10.0.0.55 10.0.0.54 10.0.0.51 Finished Tue 6:33PM 1081 75.7812 0.99822 ... Waiting on eda writer Area (LEs or ALUTs) Clock Frequency (MHz) Estimated Energy/cycle dissipated (nJ/cycle)
Any Questions? • Technical support, ask me
Setup/Install • Copy and unpack the SPREE tarball: • /jayar/b/b0/yiannac/spree.tar.gz • Build all the SPREE software • Follow instructions in INSTALL.txt • If there’s any errors, email me % cd spree % make
SPREE Directory Structure spree applications compiler cpugen simulator quartus modelsim binutils gcc newlib the cpu generator + processor descriptions Verilog simulator MIPS simulator Benchmarks C source synthesis
Setup cluster • Choose the cluster you’re using • aenao – high performance, limited access • eecg – any eecg-connected machine • Edit quartus/machines.txt • Put a list of 11 or so good eecg machines % specluster eecg % specluster aenao OR