1 / 12

Field-Programmable Gate Array Research Speeds HPC “ up to 100X ”

Field-Programmable Gate Array Research Speeds HPC “ up to 100X ”. Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics Division. THE SUPERCOMPUTER COMPANY. Explore FPGAs for future ORNL HPC. Why HPC vendors offer FPGAs.

dana
Download Presentation

Field-Programmable Gate Array Research Speeds HPC “ up to 100X ”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Field-Programmable Gate Array Research Speeds HPC “up to 100X” Olaf O. Storaasli Future Technologies GroupComputer Science and Mathematics Division

  2. THE SUPERCOMPUTER COMPANY Explore FPGAs for future ORNL HPC Why HPC vendors offer FPGAs Virtex4 FPGA blades “accelerate mission-critical applications > 100X.” Steve Scott, CTO HPCWire 24/3/2006 “After exhaustive analysis, Cray concluded that, although multi-core commodity processors will deliver some improvement, exploiting parallelism through a variety of processor technologies using scalar, vector, multithreading and hardware accelerators(e.g., FPGAs or ClearSpeed co-processors) creates the greatest opportunity for application acceleration.” ORNL benefit: Exceed petaflops and reduce power Contents • Background: Why FPGAs? • ORNL success: FPGA systems, tools and up to 100X speedup • Partners: Research Lab, , SRC, , ,

  3. What’s an FPGA?Your “custom chip” FPGA Logic slice Xilinx Virtex4 FPGA: 25K slices (miniCPUs) • Logic array: user-tailored to application • On-chip RAM, multipliers and PowerPCs • Gigabit transceivers/DSP blocks => FastIO/precision • 100–1000 operations/clock cycle

  4. Why FPGAs? • Performance—optimal silicon use (maximize parallel ops/cycle) • Rapidgrowth—cells, speed, I/O • Power—1/10th CPUs • Flexible—tailor to application 1000 300 700 Pentium Logic Cells 600 800 Virtex-4 Clock speed (MHz) FPGA Virtex4 500 200 600 Thousands 400 MHz 300 400 100 200 Pentium 200 100 0 0 0 2002 2004 2006 2008 Computation (GOPS) Memory Bandwidth IO Bandwidth (Gbps) (GB/sec)

  5. ORNL FPGA hardware/tools • SRC-6 (Carte), Digilent (Viva, VHDL), Nallatech (Viva) • Cray XD1 (MitrionC, VHDL): 6 FPGAs + 144 Opterons • SGI RASC-Altix/Virtex4s (MitrionC) • CHiMPS (Bee2 => Cray XD1 => DRC => XT4) Cray XD1 RASC sgi

  6. 8 calls in parallel FTTdd STEP FTRNPE 3 functions in parallel COMP1 FTRNDE FTRNVX FTRNEX 2 calls in parallel FFT UV FFT SHTRNS Ported HPC code spectral transform shallow water model (STSWM) to FPGAs HLL developer profiles HLL compiler CHiMPS, Mitrion (FPGA Tools Inside) FPGA speedup Goal Profile Find parallelism: 80% FFTs More GF/$ GF/Watt Model faster

  7. Exploring programming options Compiler, simulator, and debugger Gauss matrix solver Viva: Graphical icons—3-dimensional MitrionC: Text/flow—1-dimensional + Carte/SRC, CHiMPS-VHDL/Xilinx ,

  8. 37X* LU decomposition speedup10X for matrix equation solver 1000 S10e5 Single Double 865 800 600 Execution time (us) 443 404 400 258 218 200 149 133 87 57 0 64 96 128 Matrix size 40 36.6 LU Solver Benefits: High performance of LP arithmetic High precision accuracy Speedup increases with matrix size (as LU dominates calculations) 30 21.3 Speedup 20 10.9 10.3 9.7 10 7.7 0 double single S10e5 Design data type First mixed-precision LU and solver for FPGAs *FPGA vs 2.2 GHz Opteron

  9. 100X* DNA sequence speedupBacillus anthracis human DNA comparison 120 100 8 hrs => 5 min 8K w/align 16K w/align 8K w/o align 16K w/o align 80 FPGA Speedup 60 40 20 0 24# 26 28 30 32 34 36 38 40 Genome sequence # 24= Sequence AE17024 *Virtex-4 FPGA vs 2.2 GHz Opteron on Cray XD1

  10. FPGA speedup growswith query size

  11. Summary • ORNL FPGA research: • Increasing HPC relevence • FPGA systems: Cray, SRC, Nallatech, Digilent, SGI • Compilers: Mitrion-C, Carte, Viva, DSPlogic, CHiMPS • Speedup: 10X eqn soln, 100X DNA sequencing • Partners: Xilinx, UT, Mitrion, Cray, SGI • Next: Explore DRC, more FPGAs and CHiMPS Acknowledgement: This research is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

  12. Contact • Olaf Storaasli • Future Technologies GroupComputer Science and Mathematics Division • olaf@ornl.gov • GoogleOlaf ORNL 12 Storaasli_ReconfigHPC_SC07

More Related