Multiprecision

Multiprecision

Interest • For algorithms with public keys (RSA, ECC, LUC, El-Gamal …), prime numbers with several hundreds of digits are required • Checking primality requires additions, multiplications and division in arbitrary length • Processors possess arithmetic units of fixed size

How to avoid this course ? • Scolar algorithms are sufficient for numbers with several tens of digits and become obsolete only for numbers with more than 100 digits • There exists already done multiprecision libraries in most of the programming languages

Complexity notions • We suppose that we have a computer able to make elementary operations. • Those are fixed time operations (Example: addition, soustraction, 32 or 64 bits wide). • The computer is also supposed to have a memory where the data of the problem are stored. • The size of the problem, denoted T, is the number of bits required for the storage of the problem data. • The complexity of the problem, denoted C, is a function such that a problem whose size is T, requires at most C(T) elementary operations to be solved.

n 1 5 10 50 n2/3.log(n) 0 4,66 10,53 51,72 n.log(n) 0 8,05 23,03 195,60 n1,58 1 12,72 38,02 483,48 2n 2 32 1024 1,12e15 Complexity notions • Rather than exactly computing the function C, most of the time only an equivalent is determined. • Example: scolar addition scolaire has a linear complexity, sorting n integers has a complexity equivalent to n.log(n), Meissel-Lehmer algorithm has a complexity equivalent to n2/3.log(n), Karatsuba’s algorithm has a complexity equivalent to n1,58, Chernikova’s algorithm has an exponential complexity. Complexity comparison

Scolar addition • A=(an-1an-2 … a1a0) represents A= ai Diwith D=232 or 264 (depending on the size of the avalaible operators) • B=(bn-1bn-2 … b1b0). Denote S=(snsn-1 … s1s0) with S=A+B • The computation is recursive c0=0, sk=ak+bk+ck-1(mod. D) and ck=1 if bk+ck-1> D-1-ak and 0 otherwise. • Modulo operation is computed automatically and the test is computed with numbers less than D. • The complexity is linear (the number of operations is less than 4n)

Addition: complexity • Scolar addition is linear • When adding A=(an-1an-2 … a1a0) and B=(bn-1bn-2 … b1b0), the faintest modification of an ak ou a bk modifies the final result • An addition algorithm must « know » all the ak and bk. • Algorithms cannot be less than linear.

Scolar multiplication • We wish to multiply A=(an-1an-2 … a1a0) and B=(bn-1bn-2 … b1b0) • We form the partial products A.bk and we add them with shifts (products albk have to be computed in double precision  4 simple multiplications and 3 simple additions) • The cost of computation for A.bk is linear. • We have to compute n of them and we must add them. • The global cost is proportionnal with n2.

Karatsuba’s method (1962) • We wish to multiply A=(a2n-1a2n-2 … a1a0) and B=(b2n-1b2n-2 … b1b0) • Let A=A1.Dn+A0 and B=B1Dn+B0 with A1=(a2n-1a2n-2 … an+1an), A0=(an-1an-2 … a1a0), B1=(b2n-1b2n-2 … bn+1bn), B0=(bn-1bn-2 … b1b0). • M=A.B=M2D2n+M1Dn+M0 with M0=A0.B0, M1=A0.B1+A1.B0, M2=A1.B1. • We have M1=(A0+A1).(B0+B1)-M0-M2 • 3 multiplications only are required instead of 4 for the scolar multiplication. • We continue recursively (if the numbers of words is odd, we split in twice as good as possible). • The complexity of multiplication of two n-words numbers is proportional with nlog2(3)n1.58

Karatsuba: Example • A=171 659,B=891 512 • Let D=1000A1=171, A0=659, B1=891,B0=512 • M2= A1.B1= 171891= 152 361, M0= A0.B0= 659512= 337 408 • M1= (A0+A1).(B0+B1)-M0-M1= (8301 403)-M0-M1= 1 243 602-M0-M1= 674 721 • R=D2.M2+D.M1+M0=153 306 058 408

Méthode de Toom-Cook • We split in three parts • A=A2D2n+A1Dn+A0, B=B2D2n+B1Dn+B0 • M=A.B= M4D4n+M3D3n+M2D2n+M1Dn+M0 • Let P1=(A2+A1+A0).(B2+B1+B0), P-1=(A2+(-1)nA1+A0).(B2+(-1)nB1+B0), P2=(22nA2+2nA1+A0).(22nB2+2nB1+B0) • Then M0=A0B0, M1=2A2B2-(1/2)A0B0-(1/3)P-1+P1-(1/6)P2, M2=-A2B2-A0B0+(1/2)P-1+(1/2)P1et M3= - 2A2B2+(1/2)A0B0 -(1/6)P1-(1/2)P1+(1/6)P2, M4=A2B2 • Only 5 multiplications are required. • The complexity is proportional with nlog3(5)n1.46

Toom-Cook: Example • A=171 659, B=891 512 • D=100, A2=17, A1=16, A0=59, B2=89, B1=15, B0=12 • P1=(A2+A1+A0).(B2+B1+B0)=10 672 • P-1=(A2-A1+A0).(B2-B1+B0)=5 160 • P2=(4A2+2A1+A0).(4B2+2B1+B0)=63 282 • M0=A0B0=708, M4=A2B2=1 513, M1=2M4-(1/2)M0-(1/3)P-1+P1-(1/6)P2=1 077, M2=-M4-M0+(1/2)P-1+(1/2)P1=5 695, M3=-2M4+(1/2)M0-(1/6)P-1-(1/2)P1+(1/6)P2=1 679 • R=D4.M4+D3.M3+D2M2+D.M1+M0=153 036 058 408

Generalisation • We can generalize and split in r=4,5,6… parts • The complexity is proportional with nlog(2r-1)/log(r) • Theoretically we can then obtain multiplication algorithms whose complexity is proportional to n(1+) with >0 as small as desired • But intermediate computations are heavy • Practically even Toom-Cook algorithms are not implemented: if necessary, algorithms based on Fourier transforms are prefered.

Fourier transform • Multiplying numbers or polynomials is essentially the same thing. • A=(an-1an-2 … a1a0) et B=(bn-1bn-2 … b1b0) • A(X)= an-1Xn-1+an-2Xn-2+ … +a1X1+a0,, B(X)= bn-1Xn-1+bn-2Xn-2+ … +b1X1+b0 • IfP(X)=A(X).B(X), then A.B=P(D) • The degreeof P is 2n-2, if we know 2n distinct values of P, we can determine P (for instance by Lagrange’s interpolation) • Fourier transform use the points wk=exp(2ik/2n), 0k<2n • Let Aj=A(wj), Bj=B(wj) andPj=P(wj) • Let Pj=Aj.Bj

Fourier transform • We compute Aj and Bj by Fourier transform • We compute Pj = Aj . Bj • We perform an inverse Fourier transform to determine the Pi value • The computation of the Fourier transform and the inverse Fourier transform have a complexity proportional to n.log(n) • The computation for the Pj is linear • The global cost is then of the same order than n.log(n)

Complexity: multiplication • Scolar algorithm has a complexity proportional to n2 • Karatsuba’s algorithm is proportional to n1.58 • Toom-Cook’s algorithm is proportional to n1.46 • Schonage’s Fourier transform algorithm is proportional to n.log(n) • But in fact, Winograd has shown that the complexity of multiplication is just linear

Equation f(x)=0 • Let f be a real function, continuously derivable over an interval ]a,b[ which has a zero in this interval • We recursively define x0 ]a,b[, and xn+1=xn-f(xn)/f’(xn) for n1 • Depending f and x0, the series xn converges to a limit l such that f(l)=0 • Newton’s method • The convergence is generally quadratic (the number of exact decimals doubles at each iteration)

Inverse computation • Let a be real positive value, we wish to compute 1/a • Let f(x)=1/x-a • The iteration is xn+1=xn(2-a.xn) • If xn-1/a= then xn+1-1/a=a.2 • If <1/a, then a.2 <  • Thus if x0]0,2/a[, the series converges to 1/a • We need a rough estimation of 1/a to initialize the algorithm • The convergence is quadratic

Inverse: Example • a=0.52369817 • By looking at the first bits of a, we see that 0.5 < a < 1 thus 1 < 1/a < 2 we can initialize with x0=1.5 • x1=x0(2-a.x0)=1.82167911 • x2=x1(2-a.x1)=1.90545810 • x3=x2(2-a.x2)=1.90948830 • x4=x3(2-a.x3)=1.90949683

Integer division • Let A and B two integers, we want to compute A/B • We compute q=1/B with a precision better than 1/A by excess (resp. by default) • We compute A.q and we approximate by the greatest inferior integer (resp. least superior integer)

Integer division (Example) • A=309 641 782 117, B=143 557 • Considering the first bits of A, we have A<51011, then we have to compute q with a precision equal to 210-12 • We have q=0.000 006 965 874210-12 • The value for q is a value by excess (the next iteration gives a value slightly inferior) • A.q=2 156 925.639 with an error inferior to 1 • We approximate with the integer value by default A/B=2 156 925

Modulo computation • Let A and B two integers, we wish to compute r=A mod B • We compute the integer quotient q=A/B by the previous algorithm • We compute then r=A-q.B • If B is small with respect to A, there is other methods

Square root • Let a be a real positive number and we wish to compute a • Let f(x)=x2-a and we use Newton’s method • xn+1=(xn+a/xn)/2 • This method is known since antiquity as Hénon’s method • Problem: a division is necessary

Square root: 2nd method • f(x)=1/x2-a • The iteration is xn+1=(3xn-a.xn3)/2 • The series converges to 1/a, we have just to multiply by a to find a. • No more divisions • If x0]0, 2/a[, the series converges quadratically to 1/a

Square root: Example • a=5, 4<a<16, 2<a<4, 1/4<1/a <1/2 x0=0.375 • x1=(3x0-a.x03)/2=0.43066406 • x2=(3x1-a.x13)/2=0.44630628 • x3=(3x2-a.x23)/2=0.44721083 • x4=(3x3-a.x33)/2=0.44721359 • x5=(3x4-a.x43)/2=0.44721360 • x6=(3x5-a.x53)/2=0.44721360 (x6=x5we stop) • a5.x6=2.23606800, the value is exact with an error of at most 5.10-8 • In fact a= 2.23606797…

Superior order iterations • We wish to solve f(X)=0 • We suppose that x is near a zero of the equation • f(x+h)=f(x)+h.f’(x)+(h2/2).f’’(x)+ … (Taylor) • We solve in h the equation f(x)+h.f’(x)+ (h2/2).f’’(x)=0 • We have h=-(f’(x)/f’’(x))(1-(1-g(x))1/2) with g(x)=2f(x)f’’(x)/f’(x)2 • f(x) is supposed to be small and if f is a function regular enough, g(x) also is small

Superior order iterations (cont) • If  is small then 1-(1-)1/2 /2+2/8 (Taylor) • We replace with g(x) and we have (1-(1-g(x))1/2)(g(x)/2).(1+g(x)/4) • So h -(f(x)/f’(x)).(1+f(x)f’’(x)/(2.f’(x)2)) • Let xn+1=xn-(f(xn)/f’(xn)).(1+f(xn)f’’(xn) /(2.f’(xn)2)) • The series converges cubically to the zero of the equation f(X)=0 • Householder’s iteration (1970)

Superior order iterations (end) • We can continue and define quartic, quintic, … iterations. • But all the functions are not convenient • The successive derivatives have to be easy to compute • The overloading of computation cost has to be compensated by the increase of convergence speed

Example: Inverse • Computation of the inverse of a • f(x)=1/x-a • We choose an initial value x0 near 1/a • Let hn=(1-a.xn) • Quadratic iteration: xn+1=xn+xnhn (Newton) • Cubic iteration: xn+1=xn+xn(hn+hn2) (Householder) • Quartic iteration: xn+1=xn+xn(hn+hn2+hn3) • Quintic iteration: xn+1=xn+xn(hn+hn2+hn3+hn4) • Etc …

1.905458103 1.821679118 1.890664087 1.905458103 1.909495007 1.909496839 1.909488296 1.909496839 1.909496839 1.909496839 1.909496839 1.909496839 Example: Inverse • a=0.52369817, x0=1.5 quadratic cubic quartic n=1 n=2 n=3 n=4

Computation of elementary functions • Computation of functions like sin, cos, log, … • We use the properties of the function to be sent in a given interval • For a limited precision (about ten decimal digits), CORDIC algorithm can be used. It requires only additions (it is the algorithm of pocket calculators) • For a greater precision, we can use Taylor’s developments with the middle of the interval as origin (to speed up convergence) • For a maximal precision, we use techniques like binary splitting or l’AGM

Example: sinus • We have sin(x+2)=sin(x), sin(x+)=-sin(x), sin(-x)=-sin(x), sin(x)=cos(/2-x) • We can limit the computations to the interval [-/4, /4] • sin(x)=x-x3/6+x5/120+…=(-1)nx2n+1/(2n+1)! • cos(x)=1-x2/2+x4/24+…=(-1)nx2n/(2n)! • Truncated developments are computed with Horner’s algorithm in order to limit the number of multiplications • Example: 1-x2/2+x4/24-x6/720=1-(x2/1.2).(1-(x2/3.4)(1-x2/5.6))

Numerical example • Computation of sin(8) with 5 exact decimals • sin(8)=sin(3-8)=cos(8-5/2) • We have8-5/26<10-5, we can then limit the development of cos(x) to the three first terms • We use then cos(x)1-(x2/1.2).(1-(x2/3.4)) • Computations by Horner’s algorithm are stable (the rounding errors do not propagate) • Numerically we have sin(8)=0.98935826008034 • In fact sin(8)=0.98935824662338

0.23876388301 0.066666666667 0.10991119991 0.076190476191 0.12332346666 0.30969097075 0.082783882784 0.14041543333 0.43656365692 0.090231990232 0.71828182846 0.16291649048 0.099111999112 0.19381941508 1.7182818285 Example: Computation of e with 10 decimal digits • We have e=1+1/2! + 1/3! + … • We have 15! > 1010, we can then stop at the 15 st term • Let x15=1/15 and xn=(1+xn-1)/n then we have exp(1)1+x0 • In fact e2.718281828459 • The computed value is exact with 10 decimal digits

Further … • Web site: http://numbers.computation.free.fr • Arithmétique en précision arbitraire, Paul Zimmermann, Internal publication INRIA no 4272, september 2001 • AGM: Fast multiple-precision evaluation of elementary functions, Richard P. Brent, Journal of the ACM, vol 2 no 23 pages 242-251 (1976) • Computation of : Pi and the AGM, J.M. et P.B. Borwein on the web site http://www.cecm.sfu.ca

Multiprecision

Multiprecision

Presentation Transcript