Optimum Implementation of Elliptic Curve Cryptosystems on the SRC-6E Reconfigurable Computer

Optimum Implementation of Elliptic Curve Cryptosystems on the SRC-6E Reconfigurable Computer Nghi Nguyen1, Kris Gaj1, David Caliga2, Tarek El-Ghazawi3 1 George Mason University2 SRC Computers3 The George Washington University 1

What is a reconfigurable computer? Reconfigurable processor system Microprocessor system . . . P P . . . FPGA FPGA P memory P memory FPGA memory FPGA memory . . . . . . Interface Interface I/O I/O 2

Characteristic Features • close integration of the microprocessor system and the FPGA system • integrated programming environment • programming does not require hardware expertise • suitable for a wide range of applications • permits run-time reconfiguration of the FPGA system 3

SRC Hardware & Software 4

SRC Hardware Architecture 5

SRC vs. FPGA Accelerator Boards Programming Graphical Data Flow Diagram HDL HLL Software FPGA Boards Hardware Software SRC Hardware 6

SRC Compilation Process 7

Run Time Reconfiguration in SRC Program in C or Fortran FPGA contents after the Function_1 call Main program Function_1 a …… FPGA Macro_1(a, b, c) Macro_2(b, d) Macro_2(c, e) Function_1(a, d, e) Macro_1 …… c b Function_2 Macro_2 Macro_2 Macro_3(s, t) Macro_1(n, b) Macro_4(t, k) Function_2(d, e, f) d e …… 8

Elliptic Curve Cryptosystems 9

Elliptic Curve Cryptosystems • public key (asymmetric) cryptosystems • first true alternative for RSA • several times shorter keys • fast and compact implementations, in particular in hardware • a family of cryptosystems, instead of a single cryptosystem 10

Three Classes of Elliptic Curves Elliptic curves built over Secure m m=155 .. 512 K = GF(2m) K = GF(p) Our m m=233 Arithmetic operations present in many libraries Normal basis representation Polynomial basis representation Fast in hardware Compact in hardware 11

ECC Hierarchy High-level functions kP Medium-level functions 2P P+Q Low-level functions MUL INV XOR 12

Basic operations of Elliptic Curve Cryptosystems (1) Basic operations in Galois Field GF(2m) • addition andsubtraction (xor): x+y, x-y • multiplication: x  y • inversion: x-1 Basic operations on points of an Elliptic Curve over Galois Field GF(2m) • addition of points: P + Q • doubling a point: 2 P where P = (xP, yP), Q = (xQ, yQ) 13

Basic operations of Elliptic Curve Cryptosystems (2) Complex operations on points of an Elliptic Curve over Galois Field GF(2m) • scalar multiplication: k  P = P + P + …+P k times • double scalar multiplication: k  P + l  Q 14

Doubling, 2P Addition, P+Q R = 2 P R = P + Q • P = (xP, yP) • Q = (xQ, yQ) • R = (xR, yR) • xR = 2 +  + xP + xQ + a2 • yR = (xP - xR) - yP • where • = (y1 + y2)(x1 + x2)-1 • Number of field operations: • 3 multiplications • 1 inversion • P = (xP, yP) • R = (xR, yR) • x3 = a6(xP-1)2 + xP2 • y3 = xP2 + (xP + yPxP-1)xR + xR • Number of field operations: • 5 multiplications • 1 inversion a2, a6 – coefficients of a curve 15

Scalar Multiplication - kP R = kP = P + P + … + P k times k = (km-1, km-2, ..., k1, k0)2 R = O S = P for ( i=0 to m-1 ) if( ki = 1 ) R = R + S end if S = 2S end for return R can be performed in parallel 16

ECC Hierarchy High-level functions kP Medium-level functions 2P P+Q Low-level functions MUL INV XOR 17

Investigated Partitioning Schemes 18

SRC Program Partitioning C function for P P system HLL C function for MAP FPGA system VHDL macro HDL 19

H00 Partitioning (μP Software Only) C function for P H kP C function for MAP 0 VHDL macro 0 20

00H Partitioning (VHDL only) C function for P 0 C function for MAP 0 VHDL macro H kP 21

HML Partitioning C function for P kP H C function for MAP M 2P P+Q VHDL macro L INV XOR MUL 22

0HL Partitioning C function for P 0 kP C function for MAP H P+Q 2P VHDL macro INV XOR MUL L 23

0HM Partitioning C function for P 0 C function for MAP H kP VHDL macro M P+Q 2P 24

GF(2m) Multiplier Constant P Input B • Input: • A, B  GF(2m) • Output: • C = A*B mod P • 1. C = 0 • 2. for i = m-1 to 0 do • C = C<<1 + A*bi • C = C + cm*P • 5. end for • 6. return C m m AND B <<1 0 m-1 m-1 AND C A <<1 Input A m m Result m+1 clock cycles per multiplication 25

GF(2m) Inverter • Input: A  GF(2m) • Output: C = A-1 mod P • 1. Y=A, D=P, B=0, Z=1 • 2. loop • 3. while y0 = 0 do • 4. Y=Y>>1 • X=(X + z0*P)>>1 • 5. end while • 6. if (Y=1) • return Z • 8. if (D>Y) then • D<=>Y, B<=>Z • 10. Y=Y+D, Z=Z+B • 11. end loop Input A Constant P m 0 0 Swapping Swapping m B D m m 1 >>1 >>1 Z Inside Y Inside while loop while loop m m m Modified Almost Inverse Algorithm Result Time of inversion is input-dependent Typically, 3-4 times m, on average 26

Unrolled Implementation Approach Using Two FPGA Devices MUL MUL MUL MUL MUL MUL MUL MUL INV INV FPGA1 FPGA2 kP I/O 2P P+Q 27

Iterative Implementation Approach Using Two FPGA Devices MUL MUL MUL INV INV FPGA1 FPGA2 kP I/O 2P P+Q 28

Iterative Implementation Approach Using One FPGA Device MUL MUL MUL INV INV FPGA1 FPGA2 kP I/O P+Q 2P 29

Results 30

Timing Measurements .c file .mc file MAP function MAP function MAP Alloc. MAP Free FPGA Configure DMA Data In FPGA Computation DMA DataOut End-to-End time (HW) End-to-End time (SW) MAP Allocation time MAP Release Time Configuration time 31

Timing measurements 32

Resource Utilization 33

Number of lines of code 34

End-to-End Latency for Different Partitioning Approaches 101,145 35

FPGA Resource Usage for Different Partitioning Approaches 36

Conclusions • Elliptic Curve Cryptosystem implementation • challenging for reconfigurable computers because of • optimization for latency rather than throughput • limited amount of parallelism • From 8 to 9 times speed-up over highly optimized • microprocessor implementation demonstrated • using four different algorithm partitioning schemes • 0HL iterative 2-chip • 0HL unrolled 2-chip • 0HM 2-chip • 00H 1-chip 37

Conclusions – cont. Clear trade-offs: Resources Timing Ease of programming 38

Conclusions – cont. Assuming focus on: Resources Timing Ease of programming 39

C function for P 0 kP C function for MAP H P+Q 2P VHDL macro INV XOR MUL L Conclusions – cont. The best implementation approach: OHL partitioning scheme, 2-chip, unrolled Only 8% increase in the execution time compared to pure VHDL 40

Optimum Implementation of Elliptic Curve Cryptosystems on the SRC-6E Reconfigurable Computer

Optimum Implementation of Elliptic Curve Cryptosystems on the SRC-6E Reconfigurable Computer

Presentation Transcript

Timing Attacks on Elliptic Curve Cryptosystems (ECC)

Elliptic Curve Arithmetic

Elliptic curve point multiplication

Elliptic Curve Cryptography:

Introduction to Elliptic Curve Cryptography

Basic s of Elliptic Curve Cryptography

ELLIPTIC CURVE CRYPTOGRAPHY

Elliptic Curve Cryptography (ECC)

Implementation of IDEA on a Reconfigurable Computer

Elliptic Curve Cryptography

Elliptic Curve Cryptography

Elliptic Curve Cryptography

Efficient Algorithms for Elliptic Curve Cryptosystems

390-Elliptic Curves and Elliptic Curve Cryptography

A Reconfigurable System on Chip Implementation for Elliptic Curve Cryptography over GF(2 n )

Elliptic Curve Cryptography

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID?

498-Elliptic Curves and Elliptic Curve Cryptography

Elliptic Nets How To Catch an Elliptic Curve

Elliptic Curve Cryptography

General Attacks on Elliptic Curve Based Cryptosystems

Elliptic Curve Cryptography