310 likes | 1.11k Views
Elliptic Curve Arithmetic. MAPLD 2003. Siddaveerasharan Devarkal Duncan A. Buell. Elliptic Curve Cryptography. Emerging as new generation of cryptosystems based on public key cryptography No sub-exponential algorithm to solve the discrete logarithm problem
E N D
Elliptic Curve Arithmetic MAPLD 2003 Siddaveerasharan Devarkal Duncan A. Buell Devarkal and Buell
Elliptic Curve Cryptography • Emerging as new generation of cryptosystems based on public key cryptography • No sub-exponential algorithm to solve the discrete logarithm problem • Smallest key size & highest strength per bit compared to other public key cryptosystems • Smaller key sizes suitable for hardware implementation Devarkal and Buell
Reconfigurable Platform • Starbridge Systems HC 36m Hypercomputer • Four Virtex-II 6000 FPGA chips form the primary source of reconfigurable logic • Another Virtex-II 6000 chip and two Virtex-II 4000 chips are used for communication between FPGAs, and to the host • Software tool is called Viva • Graphical editor to create designs • Built-in synthesizer tool • Calls Xilinx tools for Place and Route Devarkal and Buell
Quad Structure Devarkal and Buell
Hypercomputer System Devarkal and Buell
NIST standards • NIST has proposed a specific set of elliptic curves for cryptography purposes • Elliptic curves are defined for prime fields GF(p) and binary polynomial fields GF(2m) • Prime fields for 192, 224, 256, 384 and 521 bits • Binary fields for 163, 233, 283, 409 and 571 bits • Multi-precise arithmetic of such long bit-widths Devarkal and Buell
Elliptic Curve Arithmetic Arithmetic Operation Hierarchy Devarkal and Buell
Elliptic Curve Arithmetic --- contd • For 192-bit operand, naïve M * P operation involves 191 elliptic doublings and 96 elliptic additions • For our chosen curve • 5 squarings, 8 multiplications, 5 additions and 5 shifts for a single point doubling • 2 squarings, 12 multiplications, 7 additions and 2 shifts for a single point addition • Multiplication is the kernel operation Devarkal and Buell
Multi-precise Multiplication • Naïve Divide-and-Conquer multiplier • Four base multiplier units • Karatsuba’s Divide-and-conquer multiplier • Three base multiplier units • If A = Ah . 2n/2 + Al and B = Bh . 2n/2 + Bl Then A . B = T0 . 2n + (T2 – (T1 + T0)) . 2n/2 + T1 where T0 = Ah Bh T1 =Al Bl T2 = (Ah + Al ) + (Bh +Bl) Devarkal and Buell
Modular Multiplication • Division • Expensive in hardware • Montgomery Reduction • Two multiplications and a few low-cost shifts • Algorithm For modulus N, compute R R’ – N N’ = 1 for R = 2k>N Given a double-length product T, compute m = (T (mod R)) N’ (mod R) T = (T + m N)/R “mod R” means “choose the rightmost k bits” “/R” means “shift right by k bits Devarkal and Buell
D&Q Implementation • Base Multipliers are 18-bit Hardware Multipliers • Each takes only one clock cycle, thus offering considerable improvement in speed • D&Q basically recursive and designed bottom up • 32-bit D&Q multiplier uses 3 Hardware Multipliers • Virtex-II 6000 has 144 Hardware Multipliers • Going up the hierarchy, we can scale up to a 256-bit multiplier on a single chip Devarkal and Buell
Results Virtex-II 6000 has 33,792 slices Devarkal and Buell
ECCAdd Implementation • For our chosen curve we require 14 Montgomery multiplications (includes 2 squarings) • Each Montgomery multiplier requires 3 D&Q multipliers • 14 * 3 = 42 D&Q multipliers • 32-bit D&Q requires 3 Hardware Multipliers • 32-bit ECCAdd would require 42 * 3 = 126 Hardware Multipliers • On a single chip up to 32-bit ECCAdd can fit • For higher bit-width on a single chip we will run out of Hardware Multipliers before running out of slices Devarkal and Buell
Results Virtex-II 6000 has 33,792 slices Devarkal and Buell
Multi-Chip Implementation • 64-bit D&Q requires 9 Hardware Multipliers • 64-bit ECCAdd would require 42 * 9 = 378 Hardware Multipliers • On two chips, 144 * 2 = 288 (< 378) are available • For anything between 32 and 64-bit the design has to be spread across 3 chips • The Hypercomputer provides 50-bit wide communication between FPGAs • Additional costs are incurred for large bit-width data movement between chips Devarkal and Buell
Practical Bit-width Implementation • 128-bit ECCAdd would require 42 * 27 = 1134 Hardware Multipliers • For ECCAdd 192, 224, 256, 384 and 521 we run out of hardware multipliers on all four chips before running out of slices • Arithmetic Units have to be re-used • Complex control circuitry • Partitioning and Scheduling become an issue Devarkal and Buell
Future Work • Continue the work on multi-chip implementations without component re-use • Explore the design space for practical bit-width implementations • Use hybrid multipliers in place of D&Q Devarkal and Buell