150 likes | 276 Views
A Fast Fourier Transform Compiler. Silvio D Carnevali. Contents. FFTW and genfft : an introduction genfft: How it works 1.) DAG Creation 2.) Simplifier 3.) Scheduler 4.) Unparsing Conclusion: similar applications. genfft. special purpose compiler objective Camelot
E N D
A Fast Fourier Transform Compiler Silvio D Carnevali
Contents • FFTW and genfft: an introduction • genfft: How it works 1.) DAG Creation 2.) Simplifier 3.) Scheduler 4.) Unparsing • Conclusion:similarapplications
genfft • special purpose compiler • objective Camelot • produces DFT subroutines • Outputs C code • parameterized according to: - Input length - Data type
FFTW • Collection of “Codelets” • Codelets: fragments of C code • Generated by genfft • plan: optimal composition of codelets depends on input size and HW automatically selected by FFTW (FJ98)
Performance of FFTW Powers of 2 Any powers of 2, 3, 5, 7
genfft: creation of the codelet’s DAG • Nodes: data types Encode arithmetic expressions Use real numbers for C compatibility • Generic node = operator • Children = operands • DAG Algorithm depends on input size
FT Equation • X = input vector • Y = FT of X • wn = nth root of unity
genfft: DAG Simplifier • Bottom-up traversal of DAG • local improvements: Algebraic transformations (constant folding, +/* simplification) CSE: eliminate existing + create new ones DFT-specific improvements
Algebraic transformations • Simplifies multiplication by 1, 0 or -1 • Simplifies addition by 0 • Distribution: kx + ky = k(x + y)
DFT-Specific improvements • Numeric constants made positive (Local) Constants: generally k and -k Reduces number of loads • DAG transposition (for Linear Function) Simplifies DAG, transpose + simplify, transpose + simplify Reduces number of multiplications only
5 X A 2 3 Y B 4 5 X A 2 3 Y B 4 5 X A 2 3 Y B 4 DFT-Specific improvements Simplify DAG E DAG D Transpose Simplify DAG FT DAG ET Transpose Simplify DAG F DAG E
genfft: DAG Scheduler • Goal: minimize use of regs • No instruction scheduling • Partitions DAG in 2 recursively register mapping Optimal for n = 2k Partitioning heuristics • Optimality? Not for n != 2k
genfft: Unparsing • Schedule unparsed to C • Pipeline usage managed by C compiler • genfft + C compiler: performance problems egcs “optimizer”
Conclusion & future work • FFTW: The best of the best of the best… • Over 100 downloads every week! • genfft: specialized for linear functions Crystallographic FT FIR & IIR filters Image processing (JPEG discrete cosine transform)