180 likes | 346 Views
CS 584. Fast Fourier Transform. Used in many scientific applications Transforms a periodic signal into the frequency spectrum of the signal. FFT. Given a sequence <X[0], X[1], … X[n-1]> Transform into <Y[0], Y[1], … Y[n-1]> Where. O(n 2 ). FFT.
E N D
Fast Fourier Transform • Used in many scientific applications • Transforms a periodic signal into the frequency spectrum of the signal
FFT • Given a sequence <X[0], X[1], … X[n-1]> • Transform into <Y[0], Y[1], … Y[n-1]> • Where • O(n2)
FFT • In 1965 Cooley and Tukey showed that the FFT equation could be evaluated in O(n log n) operations, resulting in: - - ( / 2 ) 1 ( / 2 ) 1 n n å å ki ki ~ ~ = w + w + w i Y [ i ] X [ 2 k ] X [ 2 k 1 ] = = 0 0 k k
FFT Procedure RecursiveFFT(X, Y, n, w) if (n == 1) Y[0] = X[0] else RecursiveFFT(<X[0],X[2],…X[n-2]>, <Q[0],Q[1],…Q[n/2]>, n/2, w2); RecursiveFFT(<X[1],X[3],…X[n-1]>, <T[0], T[1],… T[n/2]>, n/2, w2); for i = 0 to n-1 Y[i] = Q[i mod (n/2)] + wi * T[i mod (n/2)]; end Optimization Opportunity
FFT Something looks familiar?
Parallelization of FFT • Parallelize by looking at the data patterns • Two algorithms • Binary Exchange • Matrix Transpose
Binary Exchange FFT • Data exchange takes place between all pairs of processors that differ by one bit. • One element per processor • Easy • Multiple elements per processor • Assign contiguous blocks to processors • Same algorithm, just exchange blocks
Binary Exchange FFT • As n increases so does communication • Big bandwidth requirement • Powers of w cannot be precalculated • wi is used at different times on different processors • Duplicated computation
The Transpose FFT • Assume that sqrt(n) is a power of 2 • The data is arranged in a sqrt(n) x sqrt(n) two-dimensional square array
Parallelization of Transpose FFT • Notice • First two iterations are columnwise • Last two iterations are rowwise • Rather than do an exchange • Transpose the matrix halfway through algorithm
The Transpose FFT • Transposition of a striped partitioned array requires all-to-all communication • Would it be less expensive to just follow through with the algorithm or do the transpose?
Which is better? • It Depends • Architecture and amount of data play together to create tradeoffs. • Transpose algorithm is easy to generalize to higher dimensions