110 likes | 201 Views
FFT Exploration and Optimization. Stages. Divide and Conquer Rules. Cooley- Tukey Good-Thomas Rader. Rule Application. At each state a transition function is applied to produce a new program based on the following rules for a DFT of size n
E N D
Divide and Conquer Rules • Cooley-Tukey • Good-Thomas • Rader
Rule Application • At each state a transition function is applied to produce a new program based on the following rules for a DFT of size n • If n = mk and gcd(m,k) = 1 apply Good-Thomas • If n=mk apply Cooley-Tukey • If n is prime apply Raders
Example DFT 32 (define F4 (compose (tensor F2 (I 2) ) (T 4 2) (tensor (I 2) F2 ) (L 4 2) ) ) (define F8 (compose (tensor F2 (I 4) ) (T 8 4) (tensor (I 2) F4 ) (L 8 2) ) ) (define F16 (compose (tensor F2 (I 8) ) (T 16 8) (tensor (I 2) F8 ) (L 16 2) ) ) (define F32 (compose (tensor F16 (I 2) ) (T 32 2) (tensor (I 16) F2 ) (L 32 16) ) ) (F32) Different permutations of the same rules (define F4 (compose (tensor F2 (I 2) ) (T 4 2) (tensor (I 2) F2 ) (L 4 2) ) ) (define F8 (compose (tensor F4 (I 2) ) (T 8 2) (tensor (I 4) F2 ) (L 8 4) ) ) (define F16 (compose (tensor F2 (I 8) ) (T 16 8) (tensor (I 2) F8 ) (L 16 2) ) ) (define F32 (compose (tensor F16 (I 2) ) (T 32 2) (tensor (I 16) F2 ) (L 32 16) ) ) (F32) Different derivation (define F4 (compose (tensor F2 (I 2) ) (T 4 2) (tensor (I 2) F2 ) (L 4 2) ) ) (define F8 (compose (tensor F4 (I 2) ) (T 8 2) (tensor (I 4) F2 ) (L 8 4) ) ) (define F16 (compose (tensor F8 (I 2) ) (T 16 2) (tensor (I 8) F2 ) (L 16 8) ) ) (define F32 (compose (tensor F8 (I 4) ) (T 32 4) (tensor (I 8) F4 ) (L 32 8) ) ) (F32)
Example DFT 35 (define F3 (compose (RI 3) (tensor (I 1) F2 ) (E 3) (tensor (I 1) F2 ) (R 3) ) ) (define F4 (compose (tensor F2 (I 2) ) (T 4 2) (tensor (I 2) F2 ) (L 4 2) ) ) (define F5 (compose (RI 5) (tensor (I 1) F4 ) (E 5) (tensor (I 1) F4 ) (R 5) ) ) (define F6 (compose (PI 6 3 2) (tensor F3 (I 2) ) (tensor (I 3) F2 ) (P 6 3 2) ) ) (define F7 (compose (RI 7) (tensor (I 1) F6 ) (E 7) (tensor (I 1) F6 ) (R 7) ) ) (define F35 (compose (PI 35 7 5) (tensor F7 (I 5) ) (tensor (I 7) F5 ) (P 35 7 5) ) ) (F35) Different permutations of the same rules (define F3 (compose (RI 3) (tensor (I 1) F2 ) (E 3) (tensor (I 1) F2 ) (R 3) ) ) (define F4 (compose (tensor F2 (I 2) ) (T 4 2) (tensor (I 2) F2 ) (L 4 2) ) ) (define F5 (compose (RI 5) (tensor (I 1) F4 ) (E 5) (tensor (I 1) F4 ) (R 5) ) ) (define F6 (compose (PI 6 2 3) (tensor F2 (I 3) ) (tensor (I 2) F3 ) (P 6 2 3) ) ) (define F7 (compose (RI 7) (tensor (I 1) F6 ) (E 7) (tensor (I 1) F6 ) (R 7) ) ) (define F35 (compose (PI 35 7 5) (tensor F7 (I 5) ) (tensor (I 7) F5 ) (P 35 7 5) ) ) (F35)
Icode Generation • Separate front end generates un optimized Icode • Uses templates to perform generation (template (F 2) [0 == 0] ( $out(0) = $in(0) + $in(1) $out(1) = $in(0) - $in(1) ) ) ;; T(m, n) (template (T m_ n_) [0 == 0] ( do $i0 = 0,m_-0 $r0 = $i0 $out(0) = W(m_,n_,$r0) * $in(0) end ) ) ;; L(m,n) (template (L m_ n_) [n_ != 0] ( $r0 = m_ / n_ do $i0 = 0,n_-0 do $i1 = 0,$r0-0 $out($r0 1) = $in(1 n_) end end ) )
ICode Compiler • Converts program to SSA • Folds Constants • Propagates Copies • Removes Dead Code • Performs Common Sub expression Elimination ? • Outputs C