1 / 27

Self-adaptive DSP Software

Self-adaptive DSP Software. Vienna University of Technology Institute for Applied Mathematics and Numerical Analysis SFB AURORA Group 5

merlin
Download Presentation

Self-adaptive DSP Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summit on Self-adaptive Numerical Software Self-adaptive DSP Software Vienna University of Technology Institute for Applied Mathematics and Numerical Analysis SFB AURORAGroup 5 Franz Franchetti, Wilfried Gansterer, Ernst Haunschmid, Helmut Hlavacs,Florian Kaltenberger, Stefan Kral, Dieter Kvasnicka, Juergen LorenzJosef Schneid, Christoph Ueberhuber

  2. Summit on Self-adaptive Numerical Software Topics • Ongoing Efforts in Vienna • FFTs for PCs,... up to IBM‘s Blue Gene • MAP: Special purpose compiler • SALT, GRIM: Grid library with intelligent resource management • Visions for Common Goals • Self-adapting Numerical Software:Special purpose kernel compilerto replace general purpose compilers • Software for Mini-Grids:Simulation based scheduling

  3. Summit on Self-adaptive Numerical Software Efforts in Vienna • FFTs for IBM‘s Blue Gene • BG/L: 260 Tflop/s • BG/C: >1000 Tflop/s • SALT – Self-Adapting Library for Transforms • Mini-Grids • MAP – Special purpose kernel compiler • Cooperations • UTK:MAP • CMU: SPIRAL • Drexel: Dimensionless FFTs • MIT: FFTW • IBM: Blue Gene

  4. Summit on Self-adaptive Numerical Software Efforts in Vienna (2) • Single processor activities • SIMD vectorization • Automatic 2-way vectorization • Portabel n-way vectorization • FMA optimization • Streaming memory access • Special purpose backend • Parallel machine activities • Overlapping • Adaptive communication • Theory and application • Dimensionless FFT • Reduced Transform Algorithm

  5. Summit on Self-adaptive Numerical Software Current Hardware Trends • CPU – memory – network Performance gap is widening (60% / year) • Hardware “tricks” partially hide this effect • ISA extensions for high performance • General purpose compilers fail to generate satisfactory numerical kernels • Performance modeling more and more difficult

  6. Summit on Self-adaptive Numerical Software General Purpose Compilers Irix C compiler: Bad register allocation GNU C compiler: Too many integer instructions Intel C compiler: Bad instruction scheduling, Processor specific options deoptimize Example: FMA optimization Original code: 2 FMAs Optimized code: 3 FMAs

  7. Summit on Self-adaptive Numerical Software Register Allocation Example: Matrix-matrix multiply # 1 C[ 0 + 0 * ldc ] += # A[ 0 + 0 * lda ] * B[ 0 + 0 * ldb ]; ldc1 $f2,0($5) ldc1 $f1,0($4) ldc1 $f0,0($6) madd.d $f0,$f0,$f1,$f2 sdc1 $f0,0($6) .loc 22 2 1 Irix C compiler: -O –mips4: only 30 % peak performance Only 3 out of 32 fp registers used! Hand scheduled code: 90 % peak performance

  8. Summit on Self-adaptive Numerical Software GNU C vs. Kernel Backend Integer instruction statistics: AMD K7, FFTW 2.1.3 no-twiddle codelets

  9. Summit on Self-adaptive Numerical Software Vendor Compiler vs. Kernel Backend Runtime experiment: Intel Pentium 4 1800 MHz, SSE2, nfftw2, Intel C++ Compiler 5.0 vs. kernel backend

  10. Summit on Self-adaptive Numerical Software Compilers – Summary General purpose compilers • Fail to produce fast numerical kernels • Fail to utilize ISA extensions satisfactorily • Require compiler specific tricks High performance is very hard to achieve using general purpose compilers!

  11. Summit on Self-adaptive Numerical Software Performance Study: FFT(1998) Slow-down factors One processor of an SGI Power Challenge XL Adaptive software outperforms commercial software (NAG, IMSL) significantly

  12. Summit on Self-adaptive Numerical Software FFTW-SIMD vs. Non-SIMD AMD AthlonXP 1800+ (1533 MHz), Single precision, Intel C++ Compiler 5.0, gcc 2.96 and g77 2.96

  13. Summit on Self-adaptive Numerical Software SPIRAL-SIMD vs. Non-SIMD Intel Pentium4 (2530 MHz), Single precision, Intel C++ Compiler 6.0 Gflop/s (T / 5 N ld N) ld N

  14. Summit on Self-adaptive Numerical Software Mini-Grids(Departmental Grids) Observations: Desktop PCs • Tremendous computing power • Short product cycle • “Everything” is connected Mini-Grids transparently use exactly those resources needed to solve your problem efficiently Numerical libraries have to do that!

  15. Summit on Self-adaptive Numerical Software High Performance Numerical Software Requirements: • Utilization of ISA extensions • Special purpose kernel compiler • Automatic adaptation • Computation • Communication Self-adapting Numerical Software

  16. Summit on Self-adaptive Numerical Software Self-adapting Numerical DSP Software Algorithm decomposed into • Adaptive communication layer • Find optimal communication patternw.r.t. actual machine • Adaptive computation layer • Find optimal computing kernelsw.r.t. computing nodes and communication pattern Kernels generated automaticallyby special purpose compilers

  17. Summit on Self-adaptive Numerical Software Dimensionless FFT(Jeremy Johnson et al.) • DFT of Abelian groups • Unifies 1D and MD DFTs, Cooley-Tukey, Good-Thomas, Vector Radix FFT,... • Automatic optimization of parallel FFT algorithms • Data flowdescription of FFTs • Adaptation of parallel FFT algorithms to architecture • Find optimal data flow by • Simulation • Symbolic analysis

  18. Summit on Self-adaptive Numerical Software SALT – DSP Library for Grids • Target environments • Heterogeneous clusters – Mini-Grids • High-throughput computing • Metacomputing • Multiple adaptation strategies • Fastest turnaround time • Lowest job cost • Limited bandwidth usage • Interfaces standard Grid environments • Application level scheduler required

  19. Summit on Self-adaptive Numerical Software SALT – Coarse Structure GRIM: Grid ResourceInteligent Manager SALT Scheduler Adaptation SALT scheduling MPICH-G2 Globus Globus Globus system

  20. Summit on Self-adaptive Numerical Software Scheduling • CLUE (CLUster Evaluator) • Execution driven simulation • Computation: executed • Communication: modeled • Target: Heterogeneous Clusters • GRIM (Grid Resource Intelligent Manager) • Scheduling: Problem specific, Simulation based • Computation: modeled • Communication: modeled • Target: Mini-Grids

  21. Summit on Self-adaptive Numerical Software Generating High-Performance Kernels • Standard Compilers • Priority: Program flow transformations • General purpose backend • Cannot take advantage of special code structure • Do not utilize processor extensions well Code Generators • Generate C / Fortran code: • Loss of performance • No „advanced code“ • Generate native assembly code: • Loss of portability • Huge effort MAP – Special purpose kernel compiler

  22. Summit on Self-adaptive Numerical Software MAP – Concept Special purpose kernel compiler • C macro framework to express „advanced“ code but hide the details, and to support „standard“ code • High level transformations: Automatic 2-way vectorization,... • Special purpose backend to take advantage of special code structure and to produce high performance kernels Goal: provide code generators with • Control: as assembly language • Portability: as high level language • Domain specific optimization

  23. Summit on Self-adaptive Numerical Software Code generation without MAP Problem Description Code generator Generate efficient„standard“ code High level language program (C, Fortran) General purpose compiler Maybe tries some „advanced“ optimization Object code

  24. Summit on Self-adaptive Numerical Software Code generation with MAP Problem Description • Generate efficient „advanced“ code • SIMD vectorization • Prefetching • or „standard“ code Code generator Straight line code Implementation details are hidden by the macro API MAP Special purpose kernel compiler • Domain specific optimization • Automatic 2-way vectorization • Index computation • Register allocation • Special purpose backend • Processor extensions Assembly language

  25. Summit on Self-adaptive Numerical Software Example: ATLASwith MAP „matmul 8x8, FMA, 4-way SIMD“ • Generate matmul code with • generic 4-way SIMD • generic FMA • Implementation details are hidden Code generator Straight line code, only C macros MAP Special purpose kernel compiler • Architecture specific implementation • Efficient adress computation • Target SIMD implementation • Does not destroy FMAs ATLAS kernel for Motorla AltiVec

  26. Summit on Self-adaptive Numerical Software C Macro Framework Basic operations • Load/store: stack, register, memory, cache • Scalar arithmetics • Vector arithmetics • Vector permutations • Prefetching Implementation • Optimizing backend • ANSI C compiler • Machine specific C compiler: Intrinsic functions

  27. Summit on Self-adaptive Numerical Software Conclusion Self-adapting numerical software • Special purpose kernel compiler – MAP • Adaptive communication – SALT • Problem specific resource management – GRIM Target architectures • Standard computer systems • Computational Mini-Grids • IBM Blue Gene Prototypes prove performance • FFTW-SIMD: Vectorization: AMD, Intel, Motorola, Backend: AMD, Intel • SPIRAL-SIMD: Vectorization: AMD, Intel

More Related