310 likes | 448 Views
Compilers as Collaborators and Competitors of High-Level Specification Systems. David Padua University of Illinois at Urbana-Champaign. Towards a Synthesis. There is much interaction and overlap between compilers and code generation from very high level specifications.
E N D
Compilers as Collaborators and Competitors of High-Level Specification Systems David Padua University of Illinois at Urbana-Champaign
Towards a Synthesis • There is much interaction and overlap between compilers and code generation from very high level specifications. • Both technologies could merge into “supercompiler” technology. • Thesis, antithesis synthesis
Higher Levels of Abstraction… • One of the main goals of Software Research is to facilitate program development. • Raise the level of abstraction. What rather than how. • Subroutines – Control abstraction • Data abstraction mechanisms
… Higher Levels of Abstraction • Programming is simplified by using macro operations from a catalog. • Modules (subroutines/classes/…) • Part of the language (Fortran 90, MATLAB, SETL) • Standard libraries • Hand–written • Automatically generated • Application specific (usually hand written)
Performance and Abstraction • In many cases the main mechanism to attain high performance is to develop high-performance library routines. • For example, MATLAB programming style is to use functions as much as possible. • This approach does not always work. Real applications make little use of pre-existing libraries. • One reason: Data structures are not always in the right format. • Another: The overhead associated with class accesses. • For this reason, with current technology, Higher-level => Lower performance
Automatic Generation of Modules from Specifications… • Several systems aim at generating the fastest possible routines for certain classes of computations • Relatively simple (algorithms) • Very high performance implementation can be tedious and time consuming. • Examples of these systems include • ATLAS • FFTW • Spiral
… Automatic Generation of Modules from Specifications • Other systems try to simplify the generation of complete applications. Although performance is also a concern, language design and correctness are the most important issues. • Ellpack • GPSS • Many CAD systems
ATLAS • Generate several versions of BLAS routines • Different tile sizes • Different degrees of unrolling • Loop ordering is fixed • Run all and choose the fastest
Frs= (FrIs)T (Ir Fs)L FFTW • Recursive divide-and-conquer • Plan: factorization tree • Factorization stop at certain sizes • Execution: call codelets • Codelet • Subroutines for small-size FFTs • Optimized and fully-unrolled • Generated by a dedicated compiler • Adapt to environment at run-time • Dynamic programming F1024 F8 F128 F8 F16
SPIRAL DSP Transform Formula Generator SPL Formulae Search Engine SPL Compiler C/FORTRAN Programs PerformanceEvaluation DSP Libraries Target Architecture
Supercompilers … • Integration of Very High Level Specifications with Conventional Languages • Besides conventional subroutines selected from a catalog), the languages accepted by supercompilers would also call “macros” which could be used to generate code as a function of the • Target machine • Value of data • Structure of data • Shape of data • Rest of the program • Numerical properties
… Supercompilers … • Macros could be subroutines or class methods. Expanding classes could include data representation selection (including data distribution) • SETL • Automatic Dense Sparse techniques • Automatic data distribution techniques
… Supercompilers • In theory at least, generating code from specifications rather than from specific HLL implementations should lead to better performance. • All the benefits of abstraction without the performance penalty.
Vectorizers and High Level Specifications do i=1,n a(i)=b(i)+c(i) end do do i=1,n d(i)=a(i)+d(i-1) end do do i=1,n if (m > d(i)) m=d(i) end do do i=1,n a(i)=b(i)+c(i) d(i)=a(i)+d(i-1) if (m > d(i)) m=d(i) end do a(1:n)=b(1:n)+c(1:n) d(1:n) = lin-rec(a,d,1,n) m=min(m,d(1:n)
Back End Compilers and Supercompilers … • Back End Compilers take care of • Machine code generation • Register allocation • Conventional optimizations • But not really trusted by today’s module generation systems (Competitors) • The existence of ATLAS is just an indictment of current compiler technology. • FFTW does clustering to improve register allocation. • Spiral does a variety of conventional optimizations.
Optimizations in Spiral * High-level scheduling * Loop transformation Formula Generator * High-level optimizations - Constant folding - Copy propagation - CSE - Dead code elimination SPL Compiler C/Fortran Compiler * Low-level optimizations - Instruction scheduling - Register allocation
Basic Optimizations(FFT, N=25, PII, g77 –O6 –malign-double)
Can Module Generators Rely on Back End Compilers ? • Not always, but using backend compilers will always be necessary for portability (Collaborators). • But … Compilers can hinder efforts to get good performance. • For example, bad register allocation can have a serious negative impact. • Need a standard set of commands to control transformations applied by compiler
… Back End Compilers and Supercompilers • In Supercompilers transformations should be done by the Back End whenever possible. • Reason: Applies to all parts of the program not only to very high-level components.
Search … • Search is an important component of module generators. • Also used by conventional compilers, but compilers usually work with static predictions rather than actual execution times. • KAP tried all possible loop permutations. • SGI-PRO tries many combinations of unrolling of unrolling. • Superoptimizer and similar systems. • Most compiler optimization algorithms are heuristics with no search involved.
… Search … • In Supercompilers search could also be done across several algorithms looking for a good data representation and data distribution for the whole program.
… Search … • Search strategy could make use of actual execution times combined with static performance prediction • Static prediction not very accurate today. • Tight performance bounds to prune the search. • Some decisions could be made at run-time • IF statements/multiversion loops • JIT compilers
… Search • Some search could be based on data dependent behavior • Profiling • “Representative” data set • Search strategy is important given that space of possibilities is often large and not monotonic. And it is difficult to know how far the search process is from the optimum. • Need to develop tight bounds.
Coverage • Need a class of specifications large enough to represent most of the computation. • Effectiveness of approach will depend on coverage. • Current libraries are a good start. • But … it is not clear how much these libraries typically cover. • To impact programming in general current approaches would have to be extended to other domains such as sparse computations, sorting, searching. …
Conclusions • As we understand better algorithm choices and their impact in performance it becomes feasible to automate much of the process of selecting data structures and algorithms to maximize performance. • A first step: a repository of routines/classes with several implementations for each subroutine. • But generation based on context could lead to better performance. • In particular generation from very high-level specifications could allow the generation of code combining several operations in ways that is impossible to conceive with current encapsulation mechanisms.