1 / 13

Solving the Greatest Common Divisor Problem in Parallel

Solving the Greatest Common Divisor Problem in Parallel. Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. Satish Rao. The problem and serial solutions. Given two positive integers (represented in binary), find the large positive integer that divides both

Download Presentation

Solving the Greatest Common Divisor Problem in Parallel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Solving the Greatest Common Divisor Problem in Parallel Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. SatishRao

  2. The problem and serial solutions • Given two positive integers (represented in binary), find the large positive integer that divides both • Let n will be the total number of bits in the inputs • Solvable in O(n) divisions and O(n2) bit operations with Euclidean algorithm: • while b ≠ 0: (a, b) ← (b, a mod b) • Matrix formulation: • Schönhage uses this to solve in O(n log2n log logn)

  3. Adding parallelism: Algorithm PM • Brent & Kung (1982): reduce time of each step to O(1) (O(n) bit complexity) • α← n; β ← n (where a,b ≤ 2n and a odd)while b ≠ 0: while b ≡0 (mod 2) { b ← b/2; β← β− 1 } if α ≥β { swap(a,b); swap(α, β) } if a+b≡ 0 (mod 4): b ← ⌊(a+b)/2⌋else: b ← ⌊(a−b)/2⌋ • Test “a+b≡ 0 (mod 4)” uses only 2 lowest bits of sum, enables effective pipelining of the steps • Number of steps is still linear, requires n processors

  4. Algorithm PM: Example • GCD(105, 30) (a = 11010012, b = 111102) • α ← 7;β ← 7 • b ← 11112; β ← 6 • swap (a ← 11112, b ← 11010012,α ← 6,β ← 7) • a + b ≡ 112 + 012≡ 002 (mod 4) • Compute trailing zeroes of new b and 2 more bits: • b ← ⌊(a+b)/2⌋ = (011112 + …010012)/2= …1100 • b ← …112; β ← 5 • swap (a ← …112, b ← 11112,α ← 5,β ← 6)

  5. Algorithm PM: Example • a =…112, b =11112, α= 5,β= 6 • a + b ≡ 112 + 112≡ 102 (mod 4) • Compute LSB of new b while previous iteration computes next bit of a in parallel: • b' = ⌊(a−b)/2⌋ = …0 • a = …1112 • With another bit of a, can compute next bit of b’ • Final result: a=11112 , b’ = 0

  6. Sublinear time GCD • Kannan et al (1987): Break the problem into n/log n steps, each of which takes log logn time and eliminates log n bits of each input. • CRCW, O(n log logn/log n) time, O(n2 log2n) processors • Idea: instead of a mod b, compute pa mod b for all 0≤p≤n • If gcd(a,b) = d and 0 ≤ p ≤ n, then gcd(pa,b)/d ≤ n • Pigeonhole principle gives first log n bits same for at least two p • gcd(a,b) = gcd(b,(p1a mod b)-(p2a mod b)) (ignoring factors ≤ n) • Only need to know first log n bits to identify p1, p2 – can be computed in log logn time

  7. Sublinear time GCD: Example • GCD(143, 221), n = 8 • (0×143) mod 221 = 000…2 • (1×143) mod 221 = 100…2 • (2×143) mod 221 = 010…2 • (3×143) mod 221 = 110…2 • (4×143) mod 221 = 100…2 • (5×143) mod 221 = 001…2 • (6×143) mod 221 = 110…2 • (7×143) mod 221 = 011…2 • (8×143) mod 221 = 001…2 • (4×143) mod 221 − (1×143) mod 221= 13

  8. Sublinear time GCD: Using matrix form • Problem: (p1a mod b)− (p2a mod b) takes log n time • Carry lookahead adder needs log n time to add n-bit numbers • Idea: Delay updates to (a, b) for log n steps, collecting updates in a 2 × 2 matrix T, then compute T(a,b) in log n time • Results in n/log2n phases each taking log n time • Each step performs constant number of log2n-bit operations • Ensures that each step still takes log logn time • Time dominated by cost of the O(n/log n) steps

  9. Getting rid of the log logn factor • Chor and Goldreich (1990): Parallelize Brent & Kung’s PM algorithm instead of Euclidean algorithm • α ← n; β ← nwhile b ≠ 0: while b = 0 (mod 2) { b ← b/2; β← β − 1 } if α ≥ β { swap(a,b); swap(α, β) } if a+b≡ 0 (mod 4): b ← ⌊(a+b)/2⌋ else: b ← ⌊(a−b)/2⌋ • Idea: pack log n iterations of PM algorithm into each phase • Each phase can be done in constant time • Requires O(n/log n) time on O(n1+ε) processors • Best-known algorithm

  10. log n iterations in constant time • α← n; β ← nδ← 0while b ≠ 0: while b = 0 (mod 2) { b ← b/2; β← β − 1δ← δ + 1} if α ≥βδ ≥ 0 { swap(a,b); swap(α, β)δ← −δ } if a+b≡ 0 (mod 4): b ← ⌊(a+b)/2⌋ else: b ← ⌊(a−b)/2⌋ • δtogether with the last log n+1 bits of a and b determine the transformation matrix of the next log n iterations • Precompute a table – lookup, do the matrix multiplication • Technicality: Need to treat |δ| ≤ log n and |δ| > log n differently

  11. Multiplication in constant time • How to multiply an n-bit number by a log n bit number in constant time? • Represent both numbers in base 2log n • Precompute a multiplication table for this base • Separate the larger number into a sum of two numbers, each having zero for every other digit, e.g. 1234 = 1030 + 204 • Multiply both by the smaller number; no carries are required • Example: 1234× 7 = (1030+204)×7 = 7210+1428 = 8638 • Finally, Chandra et al (1983) shows we can add two n-bit numbers in O(1) time with O(n2) processors • But this requires concurrent writes

  12. Complexity perspective: an analogy • P = NP? • Open problem, but most practical problems have either been shown to be in P or else NP-hard. • Exceptions: integer factorization, graph isomorphism • If integer factorization is NP-complete, NP = coNP • If graph isomorphism is NP-complete, PH collapses to 2nd level • NC = P? • Open problem, but most practical problems have either been shown to be in NC or else P-hard. • Exceptions: GCD • What would GCD being P-complete imply?

  13. Questions?

More Related