260 likes | 278 Views
An Efficient Polynomial Multiplier in GF(2 m ) and ist Application to ECC Designs. Steffen Peter and Peter Langendörfer. Outline. Motivation and introduction into ECC Basic polynomial multiplication approaches Combinatorial polynomial multiplier Iterative polynomial multiplier
E N D
An Efficient Polynomial Multiplier in GF(2m) and ist Application to ECC Designs Steffen Peter and Peter Langendörfer
Outline • Motivation and introduction into ECC • Basic polynomial multiplication approaches • Combinatorial polynomial multiplier • Iterative polynomial multiplier • Implications for the ECC design
Elliptic Curve Cryptography • Asymmetric cryptography • Trapdoor : Elliptic Curve Point Multiplication • one can compute: Q = kP • it is infeasible to determine k for given Q and P • Higher security with shorter keys than RSA • Recommended key lengths [Lenstra & Verheul “Selecting Cryptographic Key Sizes”]
ECC in Software or Hardware? • 233 Bit ECC • on MIPS (Software) or • ECC hardware accelerator? • Time for one ECPM: • MIPS: 410 ms • HW: 0.4 ms • Energy for one ECPM: • MIPS: 16.5 mWs • HW: 0.03 mWs
EC Cryptographic Operations • Cryptographic protocols • Signature generation/verification • Encryption/decryption • Executed on a CPU • May use ECC accelerator for sub-routines CPU (MIPS, ARM, LEON,…) ECC Co-processor
EC Point Operations • Operations on points on the Elliptic Curve • Point addition: Point + Point • Point multiplication: integer · Point • (Montgomery/Lopez-Dahab Point Multiplication) • Executed on the Co-processor CPU ECC Co-processor
EC Point Operations • Asymmetric cryptography • Trapdoor : Elliptic Curve Point Multiplication • one can compute: Q = kP • it is infeasible to determine k for given Q and P
Finite Field Operations • Operations in the finite field • Addition/subtraction (m-bit XOR) • Multiplication (m-bit · m-bit) • Squaring (much faster than multiplication) • Division (very expensive) • Each EC point operation requires operations in the finite field • E.g one 233 bit EC Point multiplication • 1200 Additions • 1500 Multiplications (233 bit multiplication) • 800 Squaring • 1 division
Basic Field Operations • Prime Fields (GF(p)) • p is a very large prime (about 200 bits) • requires carries for additions • preferred for software implementations • Binary Extension Fields (GF(2m)) • m is bit length of the field (typical 160-283 bit) • easy hardware representation (m-bit array) • no carries (additions are simple XOR operations) • preferred for hardware implementations
Utilization /Area of Functional Blocks • Asymmetric cryptography • Trapdoor : Elliptic Curve Point Multiplication • one can compute: Q = kP • it is infeasible to determine k for given Q and P Utilization 15% 95% 50% Area 70% 5% 20%
Classic (school) Polynomial Multiplication ∙ = a(x) b(x) a(x) & b(x0) + a(x) & b(x1) + a(x) & b(x2) + a(x) & b(x3) . . . + + a(x) & b(xm-2) + a(x) & b(xm-1) c(x) = a(x) ∙ b(x)
Classic Polynomial Multiplication • Gate count: m2 AND gates (m-1)2 XOR gates • Longest path: 1 AND + log2(m) XOR & + & + & + & + & + & + & + &
Classic Karatsuba Multiplication a(x) A1 A0 b(x) B1 B0 A0∙B0 + A0∙B0 + (A1+ A0) ∙ (B1+ B0) + A1∙B1 + A1∙B1 c(x) = a(x) ∙ b(x) 4 additions (XOR) + 3 multiplications per level (CPM: 3 additions + 4 multiplications)
Classic Karatsuba Multiplication • Gate count: AND gates XOR gates • Longest path: 1AND + 3 log2mXOR 3 XORs each 3 XORs each 3 XORs each & & & & & & & &
Iterative Karatsuba Multiplication • Split factors in 4 segments A(x) = a3…a0 B(x) = b3…b0 • Perform 9 partial multiplications • Result is 8 segments C(x) = c7…c0
Iterative Karatsuba Multiplication (2) • Optimized aggregation plan Reduces number of XOR operations to 34 (instead of 40 for classic Karatsuba) • Without additional costs • constant number of ANDs • constant longest path • Can be applied recursively • 256 bit mul = 9 x 64 bit mul • 64 bit mul = 9 x 16 bit mul • 16 bit mul = 9 x 4 bit mul
Comparison • Hybrid RAIK is smallest polynomial multiplication unit • BUT: CPM is faster 9x 9x 9x
Recursive combinatorial multiplication units • Perform multiplication within one clock cycle • Do not need state information • Technical feasible up to 256 bit • huge complexity • high latency • Practically questionable • Data transport/bus becomes bottleneck A B MUL 256 bit 16 ns C = A·B
Iterative multiplication units • More than one clock cycle per Multiplication • Iterative unit embeds smaller recursive unit • Highly regular structure • flexible • little overhead 9 times Control Partial Multiplier Aggregation A Selection B C 256 bit 64 bit 128 bit 511 bit
Iterative multiplication units • 256 bit polynomial multipliers
Set up an ECC accelerator design • Asymmetric cryptography • Trapdoor : Elliptic Curve Point Multiplication • one can compute: Q = kP • it is infeasible to determine k for given Q and P • 283 bit • Bus • Registers • Alu • Speed requirements • 4 segment • - Multiplier(72 bit embedded) • Adapt control logic
ECC designs 163 – 571 bit • Time per ECPM
ECC designs 163 – 571 bit • Energy per ECPM and silicon area (IHP 0.25um CMOS)
Conclusions • Polynomial multiplication is the most challenging operation in the finite field: • executed 1500 times for one 233 bit ECPM • Most silicon area (70%) • Highest utilization (95%) • Large combinatorial multiplier are feasible • hRAIK is the smallest • Classic polynomial is the fastest • For ECC designs iterative Karatsuba approaches are well suited • Adaptable • Small • Energy efficient
Thank You Questions? peter@ihp-microelectronics.com