870 likes | 891 Views
Low Cost Design of Advanced Encryption Standard (AES) Processor. Ming-Chih Chen Department of Electronic Engineering National Kaohsiung First University of Science and Technology. Outline. Introduction Previous AES Design Methods
E N D
Low Cost Design of Advanced Encryption Standard (AES) Processor Ming-Chih Chen Department of Electronic Engineering National Kaohsiung First University of Science and Technology
Outline • Introduction • Previous AES Design Methods • Two Proposed Substructure Sharing Methods for XOR-based Operations • Two Proposed CSE Algorithms for Sum-of-Product Operations • Comparisons and Implementations • Conclusions
Introduction • In Oct. 2000, the Rijndael Advanced Encryption Standard was selected by the NIST (National Institute of Standards and Technology) as a new encryption standard. • The Rijndael AES algorithm is a symmetric block cipher that processes data blocks of 128 bits using cipher keys with lengths of 128, 192, and 256 bits. • Applications for AES include the security of wireless network (IEEE 802.11), smart card, …etc.
Advanced Encryption Standard Finite Field Operations AES Transformations & Algorithm
Finite Field Addition • Bitwise XOR operation (or modulo-2 addition) (Polynomial notation) (Binary notation) (Hexadecimal notation)
Multiplication in GF(28) • Multiplication of two polynomials modulo an irreducible polynomial m(x)=x8+x4+x3+x+1 • Ex: {57}·{83}={c1} • Multiplicative identity: {01} • Multiplicative inverse of b(x) is denoted by b-1(x) • Extended Euclidean algorithm • b(x)a(x) + m(x)c(x)=1 => b-1(x)=a(x) mod m(x)
Multiplication by X • b7=0 • Left shift • b7=1 • Left shift followed by bitwise XOR with {1b} • This operation is denoted by xtime( )
Polynomial with Coeffs. in GF(28) • Each coeff. of a polynomial is a byte (8-bit) • Polynomial addition: a(x) + b(x) • Byte-wise XOR for corresponding coeffs. • Polynomial multiplication modulo x4+1 • d(x) = a(x) b(x) (similar to cyclic convolution)
Inputs and Outputs • Input and output • Sequences of blocks with block length of 128 bits (Nb = 4 words for each block • Cipher key • Sequence of cipher keys with key length of 128, 192 or 256 bits (Nk = 4, 6, or 8 words for each key)
Byte Representation • Block length = 128 bits = 16 bytes • Key length = 128, 192 or 256 bits = 16, 24 or 32 bytes • Finite field element representation • Polynomial, {01100011}=x6+x5+x+1 • Hexadecimal representation • {01100011}={63} • One extra bit to the left of a byte • {01}{1b}
State: 2-D 4 x 4 array of bytes A state has four rows and Nb columns 1D array of 32-bit words w0, w1, w2, w3 with each word wi composed of a column in the 2-D state State
Rijndael AES Algorithm (a) Encryption (b) Direct Decryption (c) Modified Decryption
Four Transformations in Cipher • SubBytes( ):SB • Nonlinear byte substitution • ShiftRows( ):SR • Cyclically left-shift the last three rows of the state • MixColumns( ):MC • Transformation on each column of the state • AddRoundKey( ):ARK • Each column is XORed with a 32-bit key schedule word generated from the key expansion
SubBytes( ) • Take multiplicative inverse (MI) in GF(28): S S-1 • Apply affine transformation (AF) over GF(2) as follows: S’=M·S-1+C (C={63}16) • where S and S’ are input/output bytes in 8-D vector formats
Overall Effect of SubBytes( ) • Substitution table (S-box)
MixColumns( ) • Polynomial multiplication of a fixed term a(x)={03}x3+{01}x2+{01}x+{02} modulo x4+1
Key Expansion • For Nk=4 or 6, and i ≠ multiple of Nk − w[i] = w[i-1] ⊕ w[i-Nk] • for i = multiple of Nk − w[i] = transformation1(w[i-1]) ⊕ w[i-Nk] − Transformation 1 contains RotWord(), followed by SubWord(), followed by XOR with Rcon[i] • If Nk=8 and i-4 = multiple of Nk − w[i] = transformation2(w[i-1]) ⊕ w[i-Nk] − Transformation 2 contains SubWord() only
Key Expansion Structure: On-the-Fly w(i+2) / w(i+6) w(i+3) / w(i+7) w(i) / w(i+4) w(i+1) / w(i+5) w(i+3) / w(i+3) w(i+4) / w(i) w(i+5) / w(i+1) w(i+6) / w(i+2) w(i+7) / w(i+3)
Four Transformations in Inverse Cipher • InvSubBytes( ):ISB • Nonlinear byte substitution • InvShiftRows( ):ISR • Cyclically left-shift the last three rows of the state • InvMixColumns( ):IMC • Transformation on each column of the state • AddRoundKey( ):ARK • Each column is XORed with a 32-bit key schedule word generated from the key expansion
InvSubBytes( ) • Apply inverse affine (IAF) transformation over GF(2) as follows: S-1=M-1(S’+c) • Take multiplicative inverse (MI) in GF(28): S-1S • Overall effect: S-1-box
InvShiftRows • Cyclically right-shift the last three rows of the state.
InvMixColumns( ) • Polynomial multiplication of a fixed term a-1(x)={0b}x3+{0d}x2+{09}x+{0e} modulo x4+1
Three Categories of Transformation Optimization • The optimization of separate transformations. • The optimization of combined round transformations. • The optimization of integrated encryption/decryption transformations.
The Optimization of Separate Transformations (1) • Two major transformations: • SB (ISB), MC (IMC) • SB (ISB): • Perform MI (Multiplicative Inverse) in GF(28) followed by AF. • 1. Uses 256x8-bit table look-up ROM (S-box) to store all pre-calculated results. • 2. Changes the calculation of MI in GF(28) to that in the composite field GF((24)2). • 3. Changes the calculation of MI in GF(28) to that in the composite field GF(((22)2)2). • 4. Uses the calculation of MI in GF(28) based on matrix decomposition of A-1.
Calculation of Multiplicative Inverse (MI) in GF((24)2) (1.2a) • There are three stages for the calculation of MI in GF((24)2).
Calculation of Multiplicative Inverse (MI) in GF((24)2) (1.2b) • Stage 1: • Translate from GF(28) to the composite field in GF((24)2). Expand The implementation of T transformation has area=17AXOR , and delay=3 TXOR.
Calculation of Multiplicative Inverse (MI) in GF((24)2) (1.2c) • Stage 2: • Find the MI for the two number in GF(24). where A=(0001)2, and B=(1001)2
Calculation of Multiplicative Inverse (MI) in GF((24)2) (1.2d) • Stage 3: • Convert the number in GF((24)2) to the number in GF(28) using T-1.
Calculation of Multiplicative Inverse (MI) Using A-1 (1.4) • A-1: • The A-1 (MI) can be calculated by • It requires four GF(28) multipliers, plus one A2 and three A4 components.
The Optimization of Separate Transformations (2) • MC (IMC): • 1. Byte-level optimization: Multiplication block (XTime): multiplies a byte with a constant value {02}16 and then reduces the numbers of XTime blocks by different byte-level sharing methods. • Ex1: MC: D”={01}A+{01}B+{02}D+{03}E =A+B+XTime(D)+XTime(E)+E • Ex2: MC: D”={02}(D+E)+(A+B+D+E)+D using {02}D={02}D+D+D, D+D=0
The Optimization of Separate Transformations (3) • 2. Bit-level optimization: Common sub-expression elimination algorithm (CSE): extracts the common factors as possible in order to further reduce the hardware cost. • Ex: {02]A={a6, a5, a4, a3+a7, a2+a7, a1,a0+a7, a7} {03}A={a6+a7, a5+a6, a4+a5, a3+a4+a7, a2+a3+a7, a1+a2, a0+a1+a7, a0+a7} The factor a0+a7 appears at 1-th bit of {02}A, and 0, 1-th bits of {03}A can be extracted and replaced with a8=(a0+a7). The factor a3+a7 appears at 4-th bit of {02}A, and 3, 4-th bits of {03}A can also be extracted and replaced with a9=(a3+a7).
The Optimization of Combined Round Transformations (1) • Combine SB, SR, and MC in encryption or ISB, ISR, and IMC in decryption. • 1. Table-lookup ROM (T-box or T-1-box):
The Optimization of Combined Round Transformations (2) – 2. Combined IMC/ISR/IAF and AF/SR/MC with Shared MI in GF((24)2): (a) Combined AF/SR/MC (b) Combined IMC/ISR/IAF Integration of AES Enc. and Dec. with shared MI in GF((24)2)
The Optimization of Integrated Encryption/Decryption Transformations (1) • Two major integrations: • Integration of SB and ISB, integration of MC and IMC. • SB/ISB: • Share the same MI logic in GF(28) but multiplexes the AF and IAF.
The Optimization of Integrated Encryption/Decryption Transformations (2) • MC/IMC: • 1. Share the common factor, XTime block, for constructing one output byte of MC and IMC as shown in followed figure. • 2. Decompose the constant matrix of IMC =MC x C. C is a constant matrix as shown in the following equation.
The Optimization of Integrated Encryption/Decryption Transformations (3) – 3. Decompose the IMC=MC+F+G. F and G are two constant matrix multiplications. IMC: + + MC F G
Our Proposed Substructure Sharing Methods for XOR-based Operations Bit-level Expressions of AES Transformations Proposed Method: Bit-level Substructure Sharing
Bit-level Expressions of AES Transformations • Two kinds of major transformations, SB (ISB), MC (IMC) occupy about 65% of total area cost for implementing AES. • They can be expressed as bit-level XOR-based sum-of-product (SoP) operations. • SB: OutSB=MI+AF • ISB: OutISB=IAF+MI • MI: GF((24)2), GF(((22)2)2) • MC: OutMC={01}A+{01}B+{02}D+{03}E (1-byte output) • IMC: OutIMC={0d}A+{09}B+{0e}D+{0b}E (1-byte output)
Two Proposed CSE Algorithms for Sum-of-Product Operations Bit-level SoP Expressions Proposed Method III: Vertical CSE Algorithm Proposed Method IV: Horizontal CSE Algorithm
Bit-level Expressions (1) • A group of P bit-level equations (z0, z1, ..., zP-1) with M0 primary input variables (a0, a1, …, aM0-1) and N0 product-terms (w0, w1, …, wN0-1) can be expressed as the following matrix product form:
Bit-level Expressions (2) • The N0 intermediate bit variables wi can be expressed as • with where is defined as and .denotes the bit-wised AND operation.