430 likes | 608 Views
Unified Architectures for Efficient and Compact Crypto-Processing. Erkay Savaş Sabancı University. Outline. Research Motivation Public Key Cryptography Unified Arithmetic High-Radix Multiplication Dual-Radix Multiplication Support for GF(3 n ) Arithmetic Implementation Results
E N D
Unified Architectures for Efficient and Compact Crypto-Processing Erkay Savaş Sabancı University Erkay Savaş
Outline • Research Motivation • Public Key Cryptography • Unified Arithmetic • High-Radix Multiplication • Dual-Radix Multiplication • Support for GF(3n) Arithmetic • Implementation Results • Future Research Erkay Savaş
Motivation • Compatibility • support for fast arithmetic in different finite fields and groups • Saving in Area • Improve {time area} metric • Algorithm Agility • NTRU ECC Erkay Savaş
Public Key Cryptography (PKC) • Each user has a pair of keys: • Private Key - known only to the owner • Public Key - known to everyone in the systems with assurance • Encryption: • Encryption with the Public Key of the receiver • Decryption: • Only the receiver can decrypt the message by her/his Private Key Erkay Savaş
Public Key Cryptography in Use • RSA, Rabin’s scheme • Integer factorization, Square root of modulo a composite number • Discrete Logarithm Based Algorithms • Diffie-Helman Key Exchange, El Gamal • Elliptic curve DH Key Exchange, ECDSA • Discrete logarithm over elliptic curves • IBE • pairings over elliptic curve points Erkay Savaş
RSA • Most popular PKC • Invented by Rivest/Shamir/Adleman in 1977 at MIT. • Its patent expired in 2000. • Based on Integer Factorization problem • Each user has public and private key pair. Erkay Savaş
RSA Encryption & Decryption • Encryption done by using public key y xe mod n, where x, y < n • Decryption done by using private key x yd mod n Erkay Savaş
DL Based Cryptosystems • Fundamental operation gx mod p, where x, g < p and g is primitive Erkay Savaş
Elliptic Curve Cryptography 1/2 • Emerging public key cryptography standard for constrained devices. • 160 bit key length is equivalent in cryptographic strength to 1024-bit RSA. • 313 bit ECC is equivalent to 4096 bit RSA • As algebraic/geometric entities have been studied extensively for the past 150 years. • Rich and deep theory suitable to cryptography • First proposed for cryptographic usage in 1985 independently by Neal Koblitz and Victor Miller Erkay Savaş
Elliptic Curve Cryptography 2/2 • Dominant fundamental operations • Multiplication in GF(q) where q = pk and p is prime • Alternatives • GF(p) k = 1 • GF(2k) p = 2 • GF(pk) • GF(3k) p = 3 Erkay Savaş
Identity Based Encryption (IBE) • Public key can be any string • e-mail address, name, etc. • No need for certificates • Anonymity achieved • users can choose any public key without revealing their ID • It can easily change it Erkay Savaş
IBE – Bilinear Mapping • e(xP, yQ) = e(P, Q)xy = e(yP, xQ) = g • g is in an (extension of) the underlying field. • Bilinear mapping over elliptic curves • Weil pairing • Tate pairing • Resource consuming • Most efficient bilinear mappings • defined on curves over GF(3k) Erkay Savaş
An Introduction to UnifiedArithmetic • Types of finite fields are heavily used • Prime fields, GF(p) • Binary extension fields, GF(2k) • Ternary extension fields GF(3k) (recently, due to IBE schemes) • These finite fields feature dissimilar properties • Different implementations on specialized hardware Erkay Savaş
Unified Arithmetic • Unified hardware design methodology requires • A single (unified) datapath • A single (unified) control • Insignificant overhead in the area • Insignificant overhead in the time complexity (e.g. critical path delay) • Good {timearea} metric Erkay Savaş
Unified Arithmetic (GF(p) + GF(2k)) • A unified hardware design methodology for both field is possible since: • the elements of either field are represented using almost the same data structures in digital systems • the algorithms for basic arithmetic operations in both fields have structural similarities (i.e. the steps of the algorithms are almost identical) • Hence, eventually unified arithmetic is possible Erkay Savaş
Finite Field Operations in ECC • Addition in GF(p) and GF(2k) • Relatively inexpensive in area and time complexity • Multiplicative inversion in GF(p) and GF(2k) • Prohibitively expensive in terms of time • Possible to avoid some of them • Multiplication in GF(p) and GF(2k) • Expensive in terms of time and area • Usually most important operation • Our focus Erkay Savaş
Montgomery Multiplication • Very efficient way of doing multiplication in GF(p) and GF(2k) (now also in GF(3k)) • Faster (replaces division by shifts) • Suitable for unified design • Suitable for scalable design • Highly parallel • Suitable for pipelining Erkay Savaş
Montgomery Multiplication • Definition: • Given a, bGF(p), MonMul(a, b) = a·b·R-1 mod p, where R = 2k mod p and k = log2p. • Algorithm • c := 0 • for i = 0 to k-1 • c := (c + ai · b) • c := (c + c0 · p)/2 • if c > p then c := c-p (final subtraction) Erkay Savaş
Algorithm for GF(2k) • Input : a(x), b(x) GF(2k), p(x) and k • Output: c(x) = a(x)·b(x)·xkGF(2k) • c(x) := 0 • for i = 0 to k-1 • c(x) := (c(x) ai · b(x)) • c(x) := (c(x) c0 · p(x))/x • No final subtraction • Note that • c/2 and c(x)/x are implemented in an identical way in SW and HW Erkay Savaş
Representation • Addition • Atomic operation: multiplication is performed as a repeated addition • Unified addition • most efficient when carry-save representation is used for elements of GF(p) • Carry-save representation • an integer is represented as the sum of two other integers • x := xs + xc (sum and carry parts, resp.) Erkay Savaş
Scalability • Original Montgomery multiplication algorithm performs full-precision integer additions • Not scalable • Instead, • long integers are divided into words • Addition of words are handled separately on word adders. • Choice of word length depends on the precision, area and speed requirements Erkay Savaş
b(j) b(j+1) p(j) p(j+1) c(j) c(j+1) ai+1 b(j) p(j) c(j) PUi+1 Word-Based Multiplication ai PUi c(j)w-1 c(j)0 c(j)1 c(j+1)w-1 c(j+1)1 c(j+1)0 c(j) Erkay Savaş
Dependency Graph Erkay Savaş
FSEL Dual-Field Adder Dual-Field Adder Dual-Field Adder Dual-Field Adder Processing Unit (PU) with w=2 C1(j) C0(j) Erkay Savaş
Dual-Field Adder (DFA) 1/2 • Almost identical to a full-adder (FA) • Difference • it has and additional (control) input (FSEL) which suppress the carry output of the adder when it is set to logic-0 • Namely, when FSEL = 0 then the adder operates in GF(2k), otherwise it becomes a regular FA Erkay Savaş
DFA 2/2 B S A C FSEL Cout Erkay Savaş
SR-a RAM-a PU-1 PU-2 RAM-b RAM-p SR-C Pipeline Organization with two PUs s: the number of PUs Erkay Savaş
Total Computation Time (in clock cycles) w: word size, k: precision, e := k/w, s: the number of PUs Erkay Savaş
Example Execution Times • Example: k = 1024, w = 32 • s = 17 T = 2105 • s = 15 T = 2305 • s = 10 T = 3415 • s = 1 T = 33792 • Example: k = 2048, w = 32 • s = 33 T = 4221 • s = 30 T = 4543 • s = 10 T = 13343 • s = 1 T = 133120 Erkay Savaş
Comparison to the single-field (GF(p)) design w: word size 1.2 m CMOStechnology Erkay Savaş
Design Alternatives • Higher Radix • Original design is radix 2 • Namely, multiplier bits are scanned one bit in each clock cycle • Possible to scan two or more bits of the multiplier a • Radix-4: two bits • Radix-8: three bits • More Complex Design: lower clock frequency, higher area • Less clock cycle count Faster execution of multiplication Erkay Savaş
Comparison • Higher radix vs. single radix • Metric • area time • For small total area (i.e. <10000 equivalent NAND gates) the performances of radix-2 and radix-8 are comparable • Radix-8 multiplier outperforms radix-2 multiplier more than 3 times when the total area is around 25000 NAND gates Erkay Savaş
MUX-2 MUX-1 Selection Logic 3x2 Dual Field Adder Dual-Radix Multiplier • Radix-2 for GF(p) and radix-4 for GF(2k) Erkay Savaş
Dual-Radix Multiplier • Three multipliers • A1: GF(p)-only multiplier • A2: single-radix unified multiplier (with precomp.) • A3: dual-radix multiplier • Performance (area time) • A3 performs slightly worse than A1 and A2 (between 7% to 19%) in GF(p) mode • A3 outperforms A2 by 38% to 46% in GF(2k)-mode Erkay Savaş
Unified Arithmetic? • Unified multiplier • carry-save adders used in multiplier • It is not easy to perform other arithmetic operations with carry-save representation such as subtraction and comparison (essential in inversion) Erkay Savaş
New Redundant Representation • Recall: • Carry-save representation • X = xs + xc. • New redundant representation • Redundant signed representation (RSD) • X = xp - xn. • Subtraction is equivalent to the addition • X-Y = (xp - xn) - (yp - yn) = (xp - xn) + (yn - yp) • Comparison is relatively easy Erkay Savaş
RSD • All previous multipliers require a reverse transformation to non-redundant for after each multiplication • There are thousands multiplication in ECC • With RSD, all the computation can be done in RSD form without any reverse transformation • a single transformation is necessary if the result is needed in non-redundant form. Erkay Savaş
Support for GF(3n) Arithmetic • RSD lends itself to a unified arithmetic architecture that efficiently supports GF(3n) arithmetic Erkay Savaş
Analysis • A1: GF(p)-only architecture • A2: GF(2k)-only architecture • A3: GF(3n)-only architecture • A4: Unified architecture (GF(p) + GF(2k)) • A5: Unified architecture (GF(p) + GF(2k) + GF(3n)) • A1 + A2: Hypothetical architecture that has separate datapath for GF(p) and GF(2k) Erkay Savaş
Analysis • Metric: area time • A4 over A1 + A2: 7.94% • A5 over A1 + A2 + A3: 33.54% • A5 over A4 + A3: 28.36% Erkay Savaş
Implementation Results • 2.38 GHz, 0.13 m CMOS • 4 PUs ~11,000, 8 PUs ~15,000 NAND gates Erkay Savaş
Research Directions • Embed the unified architectures into common general-purpose processors • Unified inversion using RSD • Unified architectures for other PKC Erkay Savaş
Ending… • Questions • Contact • Erkay Savaş • erkays@sabanciuniv.edu • http://people.sabanciuniv.edu/~erkays Erkay Savaş