550 likes | 782 Views
Towards FPGA Architectures Optimized For Cryptographic Algorithms. 唐 明. Table of Contents. Antecedents Motivation General and Specific Objectives State art of the work Results Publications Future Work Conclusions. Antecedents. Cryptographic algorithms can be implemented through
E N D
Towards FPGA Architectures Optimized For Cryptographic Algorithms 唐明
Table of Contents • Antecedents • Motivation • General and Specific Objectives • State art of the work • Results • Publications • Future Work • Conclusions
Antecedents • Cryptographic algorithms can be implemented through • Software • ASIC • FPGAs Choice of platform depends upon • Algorithm performance • Cost • Flexibility
Antecedents(continued) • Software • Most flexible Low Performance • Low cost • ASIC • High performance No flexibility at all High cost • FPGAs • Most flexible • Low cost • High performance
Motivation • FPGAs-Potential Features • Cryptographic algorithms-Basic Functions
Configurable Logic Block 4 Combinational Logic 4 16x1 RAM 1-bit reg 1-bit reg 1-bit reg 1-bit reg 4 Combinational Logic 4 16x1 RAM Logic Mode Memory Mode
Virtex-II Pro 1 Logic Cell = (1) 4-input LUT + (1) FF + (1) Carry Logic 1 CLB = (4) Slices http://www.xilinx.com/products/tables/fpga.htm#v2p
Cryptographic algorithms on FPGAs Cryptographic algorithms contains: • Simple logical operations - at a bit level • Replicated blocks • block length is high Can benefits FPGAs because • FPGAs actually treat bit level operations • Blocks can be just copied • Parallelism is possible (high no. of IOs) • More physical security • Flexibility • High density
Objectives • General • To achieve optimized implementations for cryptographic algorithms Specific Objectives • DES: Data encryption standard • AES: Advance Encryption Standard • ECC: Elliptic Curve Cryptography
Background The Advanced Encryption Standard (AES Algorithm) is a computer security standard that became effective on May 26, 2002 by NIST to replace DES. The cryptography scheme is a symmetric block cipher that encrypts and decrypts 128-bit blocks of data. Lengths of 128, 192, and 256 bits are standard key lengths used by AES Algorithm.
AES: Advanced Encryption Standard Plain Text 128 AES Key 128 • AES Processes • Key Scheduling • Encryption • Decryption 128 Cipher Text
AES: Advanced Encryption Standard Input = 128 bits = 16 bytes
Key Scheduling ………………………….. …………………………..
AES Encryption Algorithm Flow USER KEY SUB KEY SUB KEY IN OUT ARK BS ARK BS SR ARK (ROUND-1..9) SR MC BS: Byte Substitution SR: Shift Rows MC: Mix Column ARK: Add Round Key
Byte Substitution SUB KEY BS ARK SR MC S-BOX 16x16 State Matrix
ShiftRow(SR) SUB KEY BS ARK SR MC Offset 0 Offset 1 SR Offset 2 Offset 3 Offset 0 Offset 1 ISR Offset 2 Offset 3
MixColumn(MC) & Inv MixColumn(IMC) SUB KEY BS ARK MC SR MC i=0,1,2,3 IMC
b0,0 b0,1 b0,2 b0,3 k0,0 k0,1 k0,2 k0,3 a0,0 a0,1 a0,2 a0,3 b1,0 b1,1 b1,2 b1,3 k1,0 k1,1 k1,2 k1,3 a1,0 a1,1 a1,2 a1,3 b2,0 b2,1 b2,2 b2,3 k2,0 k2,1 k2,2 k2,3 a2,0 a2,1 a2,2 a2,3 b3,0 b3,1 b3,2 b3,3 k3,0 k3,1 k3,2 k3,3 a3,0 a3,1 a3,2 a3,3 AddRoundKey(ARK) SUB KEY BS ARK SR MC key
Our Contributions • Design 1: Encryptor Core • Sequential vs. Pipelined Architecture • Design 2: Encryptor/Decryptor Core • MixColumn & Inv. MixColumn modified • Design 3: Encryptor/Decryptor Core • S-Box & Inv. S-Box
Our Contributions • Design 1: Encryptor Core • Sequential vs. Pipelined Architecture
USER-KEY CLK ROUND-KEY ROUND-KEY S PLAIN TEXT RND 0 CIPHER TEXT RND 1-9 LATCH RND 10 RCON CLK S USER KEY ROUND KEY KGEN LATCH AES Algorithm ImplementationSequential Approach
AES Algorithm Implementation Pipelined Approach IN REG RND 0 RND 1 RND 2 RND 3 RND 4 RND 5 RND 6 RND 7 RND 8 RND 9 RND 10 OUT IN RK 10 RK 2 RK 3 RK 4 RK 5 RK 6 RK 7 RK 8 RK 9 RK 0 RK 1 IN REG KGEN KGEN KGEN KGEN KGEN KGEN KGEN KGEN KGEN KGEN KGEN USER- KEY
Our Contributions • Design 2: Encryptor/Decryptor Core • MixColumn & Inv. MixColumn Modified
S-BOX MI AF IN IAF MI INV S-BOX AF S-BOX IN MI IAF INV S-BOX BS and Inverse BS E/D
MixColumn(MC) & Inv MixColumn(IMC) Revisted MC IMC **Every entry is represented in GF(28)
MixColumn(MC) & Inv MixColumn(IMC) Cont… For MC, the biggest co-efficient is, 03 Where For IMC, the biggest co-efficient is, 0D • The co-efficient for IMC have higher hamming weight ? • It is costly operation?
MixColumn(MC) & Inv MixColumn(IMC) Cont… We observe that, (1) (2) The biggest co-efficient for Eq.2 is, 05 Eq.1, we already have, Eq.2 calculation can be made before Eq.1
Data Path for Encryption/Decryption Encryption: MI + AF + SR + MC + ARK Decryption: ISR + IAF + MI + ModM + MC + ARK
Our Contributions • Design 3: Encryptor/Decryptor Core • S-Box & Inv. S-Box
S-BOX IAF IN MI IAF INV S-BOX Byte Substitution (Revisited) S-BOX 16x16 State Matrix
MI: 1st Approach • MI with Lookup Table • Same S-Box (MI) for encryption/decryption • Memory requirements become half • BRAMs are used for storing MI values. • No initial time to prepare them E/D E/D AF MC SR ARK MI OUT IN ISR IMC IARK IAF
Ist Transformation MI Manipulation 2nd Transformation M-1 M GF(28) TO FIELD F IN GF(24) FIELD F TO GF(28) MI: 2nd Approach MI Three-Stage Strategy S. Morioka and A. Satoh, CHES 2002 • MI with Composite Fields GF(22)2 & GF(24)2 • Map the elementAGF(28) to a composite fieldF • Compute the Multiplicative Inverse over the fieldF • Map back from fieldF to GF(28)
MI Implementation Let AF2 and A= AHy + AL, then it can be shown that:
AES Algorithm Implementations Results
Throughput := Clock cycle (Frequency) x No. of bits No. of rounds Matrix to measure? 1 2 • FPGAs Resources used • CLB slices • BRAMs • etc.
Sequential Vs Pipeline design Sequential Design Pipeline Design
MixColumn vs Inv MixColumn • Two approach for MC/IMC • Less BRAMs • Less Slices • Higher Throughput reported to-date
S-Box Vs Inv S-Box • Two approaches for MI • Key Scheduling included • No initial delay • First design uses look-up table for MI, • Fast but high memory requirements • Second design use composite field approach • for MI, Slower with less memory requirements. • Both are efficient as compared to reported design
Our Contributions Elliptic Curve Cryptography
Elliptic Curve Cryptography Scaler Multiplication Q = k P Elliptic Curve Operation Point doubling Q=2P Point addition R=P+Q Multiplication Squaring,Addition etc. GF(2m) Arithmatic
GF(2191) Arithmetic-Square A = 1111 A2= 1010101
Karatsuba Multiplier GF(2191) Then Polynomial multiplication of A and B is: The karatsuba algorithm has an idea that the above product can be written as:
Point addition GF(2191) Hessian Form
Point doubling GF(2191) Hessian Form
Performance results Tool : Xilinx Foundation F4.1i Device: XCV2600E For ECC scalar multiplication Maximum Reported timings := 170 µs [Gerardo, Chess 2000,] Estimated timings := <100 µs