180 likes | 421 Views
LegUp : High-Level Synthesis for FPGA-Based Processor/Accelerator Systems. Students: Andrew Canis , Jongsok Choi , Mark Aldham , Victor Zhang, Ahmed Kammoona Faculty: Jason Anderson, Stephen Brown Industrial Advisors: Tom Czajkowski. Motivation.
E N D
LegUp: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems Students: Andrew Canis, JongsokChoi, Mark Aldham, Victor Zhang, Ahmed Kammoona Faculty: Jason Anderson, Stephen Brown Industrial Advisors: Tom Czajkowski
Motivation • Hardware design has advantages over software: • Speed • Energy-efficiency • Hardware design is difficult and skills are rare: • 10 software engineers for every hardware engineer* • We need a CAD flow that simplifies hardware design for software engineers *US Bureau of Labour Statistics ‘08
Top-Level Vision MarkAldham int FIR(int ntaps, int sum) { int i; for (i=0; i < ntaps; i++) sum += h[i] * z[i]; return (sum); } .... Processor (MIPS) C Compiler Program code Self-Profiling Processor Profiling Data: Execution Cycles Power Cache Misses Altered SW binary (calls HW accelerators) High-levelsynthesis Suggested programsegments to target to HW P Hardenedprogramsegments Andrew Canis Victor Zhang AhmedKammoona FPGA fabric JongsokChoi
LegUp: Key Features • C to Verilog high-level synthesis • 13 C code benchmarks • MIPS processor • Hardware profiler • Automated verification tests • Open source, freely downloadable • Like ABC (Synthesis) or VPR (Place & Route)
System Architecture FPGA Hardware Accelerator Hardware Accelerator MIPS Processor AVALON BUS On-Chip Memory Memory Controller Off-Chip Memory
High-Level Synthesis Framework • Leverage LLVM compiler infrastructure: • Language support: C/C++ • Standard compiler optimizations • We support a large subset of ANSI C:
LLVM-Based High-Level Synthesis • Flexible compiler pass architecture • Passes can be swapped for alternate algorithms
High-Level Synthesis Framework • Scheduler: As Soon As Possible • Operator chaining • Multi-cycle operations: divide, multiply • Binding: Weighted Bipartite Matching • Multiplexers are expensive on an FPGA • Only share dividers and multipliers • FPGA is register-rich • No register sharing
13 C Benchmarks • 12 CHStone Benchmarks (JIP’09) and Dhrystone • Too large/complex for academic HLS tools • Include golden input/output test vectors • Not supported by academic tools
Experimental Results • Pure software on MIPS Hybrid (software/hardware): • Second most compute-intensive function (and descendants) in H/W • Same as 2 but with most compute-intensive • Pure hardware using LegUp • Pure hardware using eXcite (commercial tool)
Energy Consumption 18x less energy than software
Comparison: LegUpvseXcite • Benchmarks compiled to hardware • eXcite: Commercial high-level synthesis tool • Couldn’t compile Dhrystone
Circuit Runtime: LegUpvseXcite Geomean: 0.82
Comparison: Software vs Hardware • Software: Benchmarks run on MIPS • Hardware: LegUp flow (targeting 100% HW)
Benchmark Runtime: LegUpvs MIPS Geomean: 8x
Ongoing Work • Architecture • Memory hierarchy • Multiple clock domains • High-level synthesis • Modulo Scheduling for loop pipelining • Refactoring code for release in March • Profiling • Automatically detect functions to move to H/W