310 likes | 840 Views
Parallel Fermat’s Integer Factorization Method. F. Serdar TAŞEL Computer Engineering Department Cankaya University Instructor : Cem ÖZDOĞAN. What is integer factorization ?. Integer factorization is dividing process of an integer into its multipliers.
E N D
Parallel Fermat’s Integer Factorization Method F. Serdar TAŞEL Computer Engineering Department Cankaya University Instructor : Cem ÖZDOĞAN
What is integer factorization ? • Integer factorization is dividing process of an integer into its multipliers. • Multiplication of primes is not reversible easily. • Integer factorization problem is frequently used in cryptology. • Famous cryptosystem RSA uses integer factorization problem for security.
Some definitions • Composite number : A positive integer greater than 1 which are not prime. Examples : 21 = 7.3 9 = 3.3 70 = 7.5.2 • Semiprime : Multiplication of two primes that are not necessarily distinct. 21 and 9 are semiprimes. 70 is not a semiprime.
RSA cryptosystem • p and q are large primes. • n = p.q (n is a semiprime) • φ =(p - 1)(q - 1) • e is an integer, 1 < e < φ, gcd(e, φ) = 1 (commonly chosen as Fermat’s primes, e = 2r + 1) • d is an integer, 1 < d < φ, e.d ≡ 1 (mod φ) (d is inverse of e in modulo φ) • (n, e) is public key • (n, d) is private key
RSA cryptosystem (cont.) • Encryption : C = Pe mod n • Decryption : P = Cd mod n • P is plaintext, C is ciphertext. • Since n and e are in public key, they are known. • In order to compute d, we need to factorize n to compute φ. • p and q should be chosen not too far nor too close. • Currently, 1024 bit-RSA is commonly used. • For military applications, 4096 bit-RSA is prefered.
Some integer factorization methods • Trial division • Pollard’s rho algorithm • Pollard’s p-1 algorithm • Fermat’s factorization method • Dixon’s algorithm • Quadratic sieve* • Number field sieve* Most of the factorization methods are based on congruence of squares to find factors of a number. * QS and NFS are the most known efficient factorization algorithm for large N
Trial division • Simplest method • Useful for small N • Linear speed-up expected for parallel method. • Parallel method :
Pollard’s rho algorithm • The algorithm uses a pseudo-random function f y = f(x), y is the next number after x in the sequence. Example for f : • Very fast algorithm for N that has a small factor. • Returns failure if d = n, which means x = y, so the sequence has cycled and continuing any further would only be repeating previous work. • Different f function should be used if a failure occurs.
Pollard’s rho algorithm - parallel • A plausible use of parallelism is to try several different pseudo-random sequences for different processors. Example for f : • Non-linear speed-up • Expected speed-up = nproc1/2
Fermat’s factorization method • Fermat’s factorization method uses congruence of squares to find a factor. • N = p.q • N = x2 – y2 = (x - y)(x + y) p q • If we find x and y, then p = x – y q = x + y • The method’s speed is independent from N but the distance of factors, so the algorithm is the best when the factors are close to each other.
Fermat’s factorization method (cont.) • Try some numbers of “a” for b2 = a2 – N • If b2 is perfect square, then p = a – b, q = a + b • Note that p and q may be a composite number if N is not a semiprime. • If p or q are composite, we need to repeat the method for N = p or N = q
Fermat’s factorization - parallel • Share values of A over the processors. • Only one processor will find the factor, but the others will continue to work. A mechanism is required to stop the processors. • Choose a coordinator (master) to control the slaves. • Define an iteration number “I” and an attempt order “try”. • Let all processors including master attempt to find a factor for “I” iteration. They can calculate their “A” values according to “try”. • All slaves report their status to the master and notify whether they find a factor or not. • If no processor finds a factor, the master starts a new attempt. • Communication depends on I. • Linear speed-up expected if a suitable value is chosen for I.
Fermat’s factorization – parallel (cont.) • Parallel Algorithm : • Shared “A” numbers example : nproc = 2, I = 10, try = 0 rank 0 : 0, 2, 4, 6, 8, 10, 12, 14, 16, 18 rank 1 : 1, 3, 5, 7, 9, 11, 13, 15, 17, 19 For each attempt, nproc * I = 2 * 10 = 20 values tested.
Fermat’s factorization – parallel (cont.) Slaves Master Broadcast N and other parameters Wait for master’s command Start an attempt Busy Busy Reports Evaluate reports Wait for master’s command Start a new attempt or send exit command
Fermat’s factorization – parallel (cont.) • “I” should be chosen carefully. • If “I” is too low, communication time overheads computation time. • If “I” is too high, unwanted extra computation occurs. • For most cases, “I” should be chosen according to communication capacity of the system. (For tiny cluster, chosen I = 50000 or 250000, is plausible) • Communication time can be minimized reducing the number of attempts to 1, if “I” is estimated according to maximum possible distance of factors. For instance, bit sizes of factors of an RSA number is known, one can estimate “dist” and “I”. • Reached distance for each attempt :
Possible Improvements • Instead of attempts of iterations, use a time bound for each attempt. • For instance 1 sec for each attempt makes processors communicate every 1 sec. Unwanted computation time will be equal to the time bound at most. • If the system has processors with different speeds, calculated “A” values become distant. Therefore we will need load balancing. • For load balancing, define processor coefficients informing processor speeds which are relatively primes (If they are, “A” values tested by each processor, is more closer to each other). • Attempt order “try” is now block order.
Load Balancing try = 0 try = 1 try = 2 Other blocks c0 c1 cn c0 c1 cn c0 c1 • ci : computation coefficient of ith processor. • s : # of computations in each block. • Atry : # of computations done until the current block since the beginning. • εrank : # of computations to shift to calculate to the processor’s “A” value. • Note that “try” is independent from attempt order. A0+ ε0 A0+ ε1 A0+ ε2 A0+ εn A1+ ε0 A1+ ε1 A1+ ε2 A1+ εn A2+ ε0 A2+ ε1 A2+ ε2
Implementation • Both serial and parallel codes are in one program. • For nproc = 1, serial codes are executed. • GMP bignum library is used. • All message passing operations are blocking. • GMP structures are linked-list structure. GMP integer are exported to a binary data buffer and then transmitted by MPI. Received binary data is imported to GMP integer structure. • 500 semiprimes chosen to test the program. • Iteration values are chosen as 50000 and 250000. • Nproc values are chosen from 1 to 7.
References • Some Parallel Algorithm for Integer Factorisation (1999), Richard P. Brent, Oxford University • Parallel Integer Factorization Using Quadratic Forms (2005), Stephen S. McMath, United States Naval Academy • Donald Knuth. The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89684-2. Section 4.5.4: Factoring into Primes, pp. 379–417. • An Introduction to Cryptology (1998), Bart Preneel, Katholieke University • http://en.wikipedia.org (Jan 2007) • http://www.di-mgt.com.au/rsa_alg.html (Jan 2007) • http://www.swox.com/gmp/ (Jan 2007)