1 / 26

What is the WHT anyway, and why are there so many ways to compute it?

What is the WHT anyway, and why are there so many ways to compute it?. Jeremy Johnson. 1, 2, 6, 24, 112, 568, 3032, 16768,…. Walsh-Hadamard Transform. y = WHT N x, N = 2 n. WHT Algorithms. Factor WHT N into a product of sparse structured matrices Compute: y = (M 1 M 2 … M t )x

hye
Download Presentation

What is the WHT anyway, and why are there so many ways to compute it?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

  2. Walsh-Hadamard Transform • y = WHTN x, N = 2n

  3. WHT Algorithms • Factor WHTN into a product of sparse structured matrices • Compute: y = (M1 M2 … Mt)x yt = Mtx … y2 = M2 y3 y = M1 y2

  4. Factoring the WHT Matrix • AC Ä BD = (A Ä B)(C Ä D) • A Ä B = (A Ä I)(I Ä B) • A Ä (BÄ C) = (A Ä B)Ä C • ImÄ In = Imn WHT2Ä WHT2 = (WHT2 Ä I2)(I2 Ä WHT2)

  5. Recursive and Iterative Factorization WHT8= (WHT2 Ä I4)(I2 Ä WHT4) • = (WHT2 Ä I4)(I2 Ä ((WHT2 ÄI2) (I2 Ä WHT2))) • = (WHT2 Ä I4)(I2 Ä (WHT2 ÄI2)) (I2 Ä (I2 Ä WHT2)) • = (WHT2 Ä I4)(I2 Ä (WHT2 ÄI2)) ((I2 Ä I2) Ä WHT2) • = (WHT2 Ä I4)(I2 Ä WHT2 ÄI2) ((I2 Ä I2) Ä WHT2) = (WHT2 Ä I4)(I2 Ä WHT2 ÄI2) (I4 Ä WHT2)

  6. Recursive Algorithm WHT8 = (WHT2 I4)(I2  (WHT2  I2)(I2  WHT2))

  7. é ù 1 1 1 1 1 1 1 1 ê ú - - - - 1 1 1 1 1 1 1 1 ê ú ê ú - - - - 1 1 1 1 1 1 1 1 ê ú - - - - 1 1 1 1 1 1 1 1 ê ú = ê ú - - - - 1 1 1 1 1 1 1 1 ê ú - - - - 1 1 1 1 1 1 1 1 ê ú ê ú - - - - 1 1 1 1 1 1 1 1 ê ú - - - - ê ú 1 1 1 1 1 1 1 1 ë û é ù é ù é ù 1 1 1 1 1 1 ê ú ê ú ê ú - 1 1 1 1 1 1 ê ú ê ú ê ú ê ú ê ú ê ú - 1 1 1 1 1 1 ê ú ê ú ê ú - - 1 1 1 1 1 1 ê ú ê ú ê ú ê ú ê ú ê ú - 1 1 1 1 1 1 ê ú ê ú ê ú - - 1 1 1 1 1 ê ú ê ú ê ú ê ú ê ú ê ú - - 1 1 1 1 1 1 ê ú ê ú ê ú - - - ê ú ê ú ê ú 1 1 1 1 1 1 ë û ë û ë û Iterative Algorithm WHT8 = (WHT2 I4)(I2  WHT2  I2)(I4  WHT2)

  8. WHT Algorithms • Recursive • Iterative • General

  9. n + ··· + n n + ··· + n i +1 t 1 i - 1 WHT Implementation • Definition/formula • N=N1* N2NtNi=2ni • x=WHTN*x x =(x(b),x(b+s),…x(b+(M-1)s)) • Implementation(nested loop) R=N; S=1; for i=t,…,1 R=R/Ni forj=0,…,R-1 for k=0,…,S-1 S=S* Ni; M b,s t Õ ) Ä Ä ( I I WHT WHT = n n 2 2 2 2 i i = 1

  10. 9 4 4 3 4 2 1 1 3 3 4 1 2 1 1 1 2 2 1 1 1 1 4 1 1 1 1 1 1 2 2 1 1 1 1 Partition Trees Left Recursive Right Recursive Balanced Iterative

  11. Ordered Partitions • There is a 1-1 mapping from ordered partitions of n onto (n-1)-bit binary numbers. • There are 2n-1 ordered partitions of n. 162 = 1 0 1 0 0 0 1 0 1|1 1|1 1 1 1|1 1  1+2+4+2 = 9

  12. 3 2 1 1 1 Enumerating Partition Trees 00 01 01 3 3 3 2 1 2 1 1 1 10 10 11 3 3 1 2 1 1 1

  13. Search Space • Optimization of the WHT becomes a search, over the space of partition trees, for the fastest algorithm. • The number of trees:

  14. Size of Search Space • Let T(z) be the generating function for Tn Tn = (n/n3/2), where =4+8  6.8284 • Restricting to binary trees Tn = (5n/n3/2)

  15. WHT PackagePüschel & Johnson (ICASSP ’00) • Allows easy implementation of any of the possible WHT algorithms • Partition tree representation W(n)=small[n] | split[W(n1),…W(nt)] • Tools • Measure runtime of any algorithm • Measure hardware events • Search for good implementation • Dynamic programming • Evolutionary algorithm

  16. Histogram (n = 16, 10,000 samples) • Wide range in performance despite equal number of arithmetic operations (n2n flops) • Pentium III consumes more run time (more pipeline stages) • Ultra SPARC II spans a larger range

  17. Operation Count Theorem. Let WN be a WHT algorithm of size N. Then the number of floating point operations (flops) used by WN is Nlg(N). Proof. By induction.

  18. Instruction Count Model • A(n) = number of calls to WHT procedure • = number of instructions outside loops Al(n) = Number of calls to base case of size l •  l = number of instructions in base case of size l • Li = number of iterations of outer (i=1), middle (i=2), and • outer (i=3) loop • i = number of instructions in outer (i=1), middle (i=2), and • outer (i=3) loop body

  19. Small[1] .file "s_1.c" .version "01.01" gcc2_compiled.: .text .align 4 .globl apply_small1 .type apply_small1,@function apply_small1: movl 8(%esp),%edx //load stride S to EDX movl 12(%esp),%eax //load x array's base address to EAX fldl (%eax) // st(0)=R7=x[0] fldl (%eax,%edx,8) //st(0)=R6=x[S] fld %st(1) //st(0)=R5=x[0] fadd %st(1),%st // R5=x[0]+x[S] fxch %st(2) //st(0)=R5=x[0],s(2)=R7=x[0]+x[S] fsubp %st,%st(1) //st(0)=R6=x[S]-x[0] ????? fxch %st(1) //st(0)=R6=x[0]+x[S],st(1)=R7=x[S]-x[0] fstpl (%eax) //store x[0]=x[0]+x[S] fstpl (%eax,%edx,8) //store x[0]=x[0]-x[S] ret

  20. Recurrences

  21. Recurrences

  22. Histogram using Instruction Model (P3)  l = 12,  l = 34, and  l = 106  = 27 1 = 18, 2 = 18, and 1 = 20

  23. Algorithm Comparison

  24. Dynamic Programming n Cost( ), min n1+… + nt= n … T T nt n1 where Tn is the optimial tree of size n. This depends on the assumption that Cost only depends on the size of a tree and not where it is located. (true for IC, but false for runtime). For IC, the optimal tree is iterative with appropriate leaves. For runtime, DP is a good heuristic (used with binary trees).

  25. Optimal Formulas UltraSPARC [1], [2], [3], [4], [5], [6] [[3],[4]] [[4],[4]] [[4],[5]] [[5],[5]] [[5],[6]] [[4],[[4],[4]]] [[[4],[5]],[4]] [[4],[[5],[5]]] [[[5],[5]],[5]] [[[5],[5]],[6]] [[4],[[[4],[5]],[4]]] [[4],[[4],[[5],[5]]]] [[4],[[[5],[5]],[5]]] [[5],[[[5],[5]],[5]]] Pentium [1], [2], [3], [4], [5], [6] [7] [[4],[4]] [[5],[4]] [[5],[5]] [[5],[6]] [[2],[[5],[5]]] [[2],[[5],[6]]] [[2],[[2],[[5],[5]]]] [[2],[[2],[[5],[6]]]] [[2],[[2],[[2],[[5],[5]]]]] [[2],[[2],[[2],[[5],[6]]]]] [[2],[[2],[[2],[[2],[[5],[5]]]]]] [[2],[[2],[[2],[[2],[[5],[6]]]]]] [[2],[[2],[[2],[[2],[[2],[[5],[5]]]]]]]

  26. Different Strides • Dynamic programming assumption is not true. Execution time depends on stride.

More Related