1 / 30

CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246: Computer Arithmetic Algorithms and Hardware Design. Lecture 6.1 Multiplication Arithmetic. Instructor: Prof. Chung-Kuan Cheng. Topics:. Karatsuba ’ s Method (1962) Toom ’ s Method (1963) Modular Method FFT. Karatsuba ’ s Method. U=2 n U 1 +U 0 , V=2 n V 1 +V 0

akovar
Download Presentation

CSE 246: Computer Arithmetic Algorithms and Hardware Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 246: Computer Arithmetic Algorithms and Hardware Design Lecture 6.1 Multiplication Arithmetic Instructor: Prof. Chung-Kuan Cheng

  2. Topics: • Karatsuba’s Method (1962) • Toom’s Method (1963) • Modular Method • FFT

  3. Karatsuba’s Method • U=2nU1+U0, V=2nV1+V0 • UV= 22nU1V1+2n(U1V0+U0V1)+U0V0 = (22n+2n)U1V1+2n(U1-U0)(V0-V1)+(2n+1)U0V0 T(2n)<= 3T(n)+cn T(2k)<=c(3k-2k) T(n)=T(2lgn)<=c(3lgn-2lgn)<3cnlg3 lg3=1.585

  4. Toom’s Method • U=2rnUr+…+2nU1+U0 • V=2rnVr+…+2nV1+V0 • U(x)= xrUr+…+xU1+U0 • V(x)= xrVr+…+xV1+V0 • U(x)V(x)=W(x)= x2rW2r+…+xW1+W0 Set 2r+1 equations: W(0)=U(0)V(0) W(1)=U(1)V(1) W(2r)=U(2r)V(2r)

  5. Toom’s Method • T((r+1)n)<= (2r+1)T(n)+cn • T(n)<=cnlogr+1(2r+1)<cn1+logr+12 Theorem: Given e> 0, there exists a multiplication algorithm such that the number of elementary operation T(n) needed to multiply two n-bit numbers satisfies for some constant c(e) independent of n T(n)<c(e)n1+e

  6. Toom’s Method • U=(4,13,2)16, V=(9,2,5)16 • U(x)=4x2+13x+2, V=9x2+2x+5 • W(x)=U(x)V(x) • W(0)=10, W(1)=304,W(2)=1980 • W(3)=7084,W(4)=18526 • W(x)= x2rW2r+…+xW1+W0

  7. Toom’s Method • W(x)= x2rW2r+…+xW1+W0 • Rewrite W(x)= a2rx2r+…+a1x1+a0 where xk=x(x-1)…(x-k+1) W(x+1)-W(x)= 2ra2rx2r-1+(2r-1)a2r-1x2r-2…+a1 (W(x+2)-W(x+1))-(W(x+1)-W(x))= 2r(2r-1)a2rx2r-2+(2r-1)(2r-2)a2r-1x2r-3…+2a2

  8. Toom’s Method • W(*)=10, 304, 1980, 7084, 18526 • W’(*)=294, 1676, 5104, 11442 • W’’(*)=1382, 3428, 6338 • W’’(*)/2= 691, 1714, 3169 • W’’’(*)/2= 1023, 1455 • W’’’(*)/6= 341, 485 • W’’’’(*)/6= 144 • W’’’’(*)/24= 36 • W(x)= 36x4+341x3+691x2+294x1+10 =(((36(x-3)+341)(x-2)+691)(x-1)+294)x+10 = 36x4+125x3+64x2+69x+10

  9. Toom’s Method

  10. Toom and Cook’s Method • Theorem: There is a constant c such that the execution time of Toom and Cook’s method is less than cn23.5sqrt(lgn) cycles

  11. Modular Method (Schonhage) • Recursive formula: q0=1, qk+1=3qk-1 • Thus, we have qk=1/2(3k+1) • Relatively prime pi 6qk-1,6qk+1,6qk+2,6qk+3,6qk+5,6qk+7 • Set six moduli mi=2pi-1

  12. Modular Method • Given U and V, Find W=UxV • Compute ui=Umodmi vi=Vmodmi • Compute wi=uixvimodmi • Recover W T(n)=O(nlog36)=O(n1.631)

  13. FFT Given U(t)=(u0,u1,…uK-1),V(t)=(v0,v1,…vK-1) Find P(t)=(p0,p1,…,pK-1), where pt=sum(i+j=t modK) uivj • Set w=exp(2pi/K), i.e. wK=1 • us= sum(0<=t<K) wstut • vs= sum(0<=t<K) wstvt • U(s)V(s)=(u0v0,u1v1,…,uK-1vK-1) • P(s)=U(s)V(s), ps=usvs • ps= sum(0<=t<K) wstpt

  14. FFT • K>= 2n-1, un=un+1=…=uK-1=0 • vn=vn+1=…=vK-1=0 • pt=sum(i+j=t modK)uivj =utv0+ut-1v1+…+u0vt

  15. FFT (K=2k ,t=(tk-1,…,t0)) • Set A0(tk-1,…,t0)=ut ,i.e. A0(t)=ut • Set A1(sk-1,tk-2,…,t0)= A0(0,tk-2,…,t0)+w2k-1sk-1A0(1,tk-2,…,t0) • Set A2(sk-1,sk-2,tk-3,…,t0)= A1(sk-1,0,tk-3,…,t0)+ w2k-2(sk-2sk-1)2A1(sk-1,1,tk-3,…,t0) • Set Ak(sk-1,sk-2,sk-3,…,s0)= Ak-1(sk-1,…,s1,0)+ w(s0s1…sk-1)2Ak-1(sk-1,…,s1,1)

  16. FFT (K=2k ,t=(tk-1,…,t0)) • Replace tk-1 with sk-1 • sk-1 determinesw2k-1sk-1 • Replace tk-2 with sk-2 • sk-1,sk-2 determines w2k-2(sk-2sk-1)2 • Replace t0 with s0 • sk-1,sk-2,…,s0 determines w(s0s1…sk-1)2 • Binary s=(s0,s1,…,sk-1)2

  17. FFT (K=2k ,t=(tk-1,…,t0)) By induction, we have Aj(sk-1,…,sk-j,tk-j-1,…,t0)= sum(tk-1,…,tk-j)w2k-j (sk-j,…,sk-1)2 (tk-1,…,tk-j)2ut Ak(sk-1,…,s0)= sum(tk-1,…,t0) w(s0,…,sk-1)2(tk-1,…,t0)2ut =us

  18. FFT: k=2 =

  19. FFT: k=2 =

  20. FFT: k=2 =

  21. FFT: k=2 =

  22. FFT: k=3

  23. FFT: k=3

  24. FFT: k=3

  25. FFT: k=3 =

  26. FFT • us=u0+u1s+u2s2+…+u2k-1s2k-1 • us=u0+u2s2+…+u2k-2s2k-2 +u1s+u3s3+…+u2k-1s2k-1 • us= Fe(s2) + sFd(s2) Fe(s2)=u0+u2s2+…+u2k-2s2k-2 Fd(s2)=u1+u3s2+…+u2k-1s2k-1 • us= Fee(s4)+s2Fed(s4) + s[Fde(s4) +s2Fdd(s4)]

  27. FFT • us=u0+u1s+u2s2+…+u2k-1s2k-1 • us= Fee(s4)+s2Fed(s4) + s[Fde(s4) +s2Fdd(s4)] • us= Feee(s8)+ s4Feed(s8) + s2[Fede(s8)+ s4Fedd(s8)] + s{[Fdee(s8)+s4Fded(s8)]+s2[Fdde(s8)+ s4Fddd(s8)]} • Fx…x(s2k-1)= Fx…xe(s2k) + s2k-1Fx…xd(s2k)

  28. FFT • us=u0+u1s+u2s2+u3s3+u4s4+u5s5+u6s6+u7s7 • us= Fe(s2) + sFd(s2) Fe(s2)=u0+u2s2+u4s4+u6s6 Fd(s2)=u1+u3s2+u5s4+u7s6 • Fe(s2)=Fee(s4) + s2Fed(s4) Fee(s4)=u0+u4s4, Fed(s4)=u2+u6s6 • Fd(s2)=Fde(s4) + s2Fdd(s4) Fde(s4)=u1+u5s4, Fdd(s4)=u3+u7s4 • Fx(s=w0)=Fx(s=w4), Fx(s=w2)=Fx(s=w6), Fx(s=w)=Fx(s=w5), Fx(s=w3)=Fx(s=w7) x=e,d (s0,s1,s2)=(-,0,0),(-,0,1),(-,1,0),(-,1,1) • Fxx(s=w0)=Fxx(s=w2)=Fxx(s=w4)=Fxx(s=w6),Fxx(s=w)=Fxx(s=w3)=Fxx(s=w5)=Fxx(s=w7), xx=ee,ed,de,dd, (s0,s1,s2)=(-,-,0),(-,-,1)

  29. FFT (Inversion) • ur== sum(0<=s<K)wrsus = sum(0<=s,t<K)wrswstut = sum(0<=t<K)utsum(0<=s<K)ws(t+r) =Ku(-r)modK sum(0<=s<K)wsj=K if jmodK=0, 0 otherwise.

  30. FFT • 2n<=2k g< 4n, K=2k • Precision m= 6k • Let M= time of m-bit multiplication • Total time to multiply n-bit numbers O(n)+O(Mnk/g)

More Related