1 / 52

Parallel Analysis of the Rijndael Block Cipher

This paper discusses the analysis of the Rijndael block cipher and explores how parallel models of computation can be used to achieve optimal performance. Topics covered include cost models, prefix sum computation, and the various transformations involved in the Rijndael cipher.

cedricc
Download Presentation

Parallel Analysis of the Rijndael Block Cipher

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Analysis of the Rijndael Block Cipher Philip Brisk Adam Kaplan Majid Sarrafzadeh Embedded & Reconfigurable Systems Lab Computer Science Department IASTED-PDCS November, 2003

  2. Outline • Introduction • Background Material • Analysis of the Rijndael Cipher • Concluding Remarks 1/34 IASTED-PDCS November, 2003

  3. Parallel Models of Computation and Cryptography • Achieving optimal performance of cryptographic algorithms is imperative! • Goal: Understand how to accelerate performance by studying cryptography under parallel models of computation. 2/34 IASTED-PDCS November, 2003

  4. What can we Learn from Parallel Models of Computation? • Identification of performance bottlenecks. • How to design efficient cryptographic hardware. • Techniques to improve future algorithms. 3/34 IASTED-PDCS November, 2003

  5. Outline • Introduction • Background Material • Cost Model • Prefix Sum Computation • Analysis of the Rijndael Cipher • Concluding Remarks 4/34 IASTED-PDCS November, 2003

  6. Cost Model • n : problem size • t(n) : number of steps • p(n) = N > 1 : number of processors c(n) : costs(n) : speedup 5/34 IASTED-PDCS November, 2003

  7. Cost Optimality • Cost ≡ the number of steps executed collectively by all processors. • An algorithm is cost-optimal on a parallel model of computation if: 6/34 IASTED-PDCS November, 2003

  8. Prefix Sum Computation • P – a set of N processors: {P1, …, PN} • Processor Pi holds a value ai. • For each processor Pi, compute the sum Si: Algorithm: for i = 1 to N Si = ai + Si-1 • Addition can be generalized to any binary associative operation. 7/34 IASTED-PDCS November, 2003

  9. Prefix Sum Computation • Meijer and Akl [1987] described a solution using a binary tree of processors. 3 6 1 4 8/34 IASTED-PDCS November, 2003

  10. 3 6 1 Prefix Sum Computation • Meijer and Akl [1987] described a solution using a binary tree of processors. 3 6 1 4 8/34 IASTED-PDCS November, 2003

  11. 9 3 1 Prefix Sum Computation • Meijer and Akl [1987] described a solution using a binary tree of processors. 3 6 1 4 8/34 IASTED-PDCS November, 2003

  12. Prefix Sum Computation • Meijer and Akl [1987] described a solution using a binary tree of processors. 9 3 6 9 1 4 5 8/34 IASTED-PDCS November, 2003

  13. Prefix Sum Computation • Meijer and Akl [1987] described a solution using a binary tree of processors. 9 9 3 9 1 5 8/34 IASTED-PDCS November, 2003

  14. Prefix Sum Computation • Meijer and Akl [1987] described a solution using a binary tree of processors. 3 9 10 14 8/34 IASTED-PDCS November, 2003

  15. A Cost-Optimal Prefix Sum • To achieve cost optimality: 9/34 IASTED-PDCS November, 2003

  16. Outline • Introduction • Background Material • Analysis of the Rijndael Cipher • Concluding Remarks 10/34 IASTED-PDCS November, 2003

  17. The Rijndael Cipher • The cipher iterates in a series of rounds. • Each round requires a Key • Using the same key every round is not secure. • Providing a sequence of keys as an input is unreasonable. • A key schedule is uses the original key to compute a new key for each round. 11/34 IASTED-PDCS November, 2003

  18. Key Schedule Key Expansion Expands the original key analogously to prefix-sum computation. Round Key Selection Divides the expanded key between the rounds of the cipher Round Transformation 4 sub-transformations applied during each round: ByteSub Shift Row MixColumn AddRoundKey The Rijndael Cipher 12/34 IASTED-PDCS November, 2003

  19. The Rijndael Cipher: Parameters • Nb – Block Length (# bytes in state) • Nk – Key Length • Nr – Number of Rounds • The key and state are represented as 2-dimensional arrays of bytes. 13/34 IASTED-PDCS November, 2003

  20. Representation of the State • The state is represented by a 4 x Nb/4 array of bytes (Nb = 4, 6, or 8) Nb a0,0 a0,1 a0,2 a0,3 a1,0 a1,1 a1,2 a1,3 4 a2,0 a2,1 a2,2 a2,3 a3,0 a3,1 a3,2 a3,3 14/34 IASTED-PDCS November, 2003

  21. The ByteSub Transformation • Apply an S-Box to every byte in the state. a0,0 a0,1 a0,2 a0,3 S-BOX b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 ai,j a1,2 a1,3 b1,0 bi,j b1,1 b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 State 8-bit lookup table State 15/34 IASTED-PDCS November, 2003

  22. The ByteSub Transformation a0,0 a0,1 a0,2 a0,3 S-BOX b0,0 b0,1 b0,2 b0,3 a1,0 ai,j a1,1 a1,2 a1,3 b1,0 bi,j b1,1 b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 State 8-bit lookup table State 15/34 IASTED-PDCS November, 2003

  23. 1 processor t(n) = O(Nb) The ByteSub Transformation a0,0 a0,1 a0,2 a0,3 S-BOX b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 ai,j a1,2 a1,3 b1,0 bi,j b1,1 b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 State 8-bit lookup table State 15/34 IASTED-PDCS November, 2003

  24. 4 x Nb processors t(n) = O(1) The ByteSub Transformation a0,0 a0,1 a0,2 a0,3 S-BOX b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 ai,j a1,2 a1,3 b1,0 bi,j b1,1 b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 State 8-bit lookup table State 15/34 IASTED-PDCS November, 2003

  25. The Shift-Row Transformation • Shift each row of the state by a constant. a0,0 a0,1 a0,2 a0,3 b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 a1,2 a1,3 b1,1 b1,2 b1,3 b1,0 a2,0 a2,1 a2,2 a2,3 b2,2 b2,3 b2,0 b2,1 a3,0 a3,1 a3,2 a3,3 b3,3 b3,0 b3,1 b3,2 State State 16/34 IASTED-PDCS November, 2003

  26. 1 processor t(n) = O(Nb) The Shift-Row Transformation a0,0 a0,1 a0,2 a0,3 b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 a1,2 a1,3 b1,1 b1,2 b1,3 b1,0 a2,0 a2,1 a2,2 a2,3 b2,2 b2,3 b2,0 b2,1 a3,0 a3,1 a3,2 a3,3 b3,3 b3,0 b3,1 b3,2 State State 16/34 IASTED-PDCS November, 2003

  27. 4 x Nb processors t(n) = O(1) The Shift-Row Transformation a0,0 a0,1 a0,2 a0,3 b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 a1,2 a1,3 b1,1 b1,2 b1,3 b1,0 a2,0 a2,1 a2,2 a2,3 b2,2 b2,3 b2,0 b2,1 a3,0 a3,1 a3,2 a3,3 b3,3 b3,0 b3,1 b3,2 State State 16/34 IASTED-PDCS November, 2003

  28. The Mix-Column Transformation • Apply to each column in the state. a0,j b0,j a0,0 a0,1 a0,2 a0,3 Mix- Column b0,0 b0,1 b0,2 b0,3 a1,j b1,j a1,0 a1,1 a1,2 a1,3 b1,0 b1,1 b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a2,j b2,j a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 a3,j b3,j 4x4 Byte Matrix State State 17/34 IASTED-PDCS November, 2003

  29. The Mix-Column Transformation a0,j b0,j a0,0 a0,1 a0,2 a0,3 Mix- Column b0,0 b0,1 b0,2 b0,3 a1,j b1,j a1,0 a1,1 a1,2 a1,3 b1,0 b1,1 b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a2,j b2,j a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 a3,j b3,j 4x4 Byte Matrix State State 17/34 IASTED-PDCS November, 2003

  30. 1 processor t(n) = O(Nb) The Mix-Column Transformation a0,j b0,j a0,0 a0,1 a0,2 a0,3 Mix- Column b0,0 b0,1 b0,2 b0,3 a1,j b1,j a1,0 a1,1 a1,2 a1,3 b1,0 b1,1 b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a2,j b2,j a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 a3,j b3,j 4x4 Byte Matrix State State 17/34 IASTED-PDCS November, 2003

  31. O(Nb) processors t(n) = O(1) The Mix-Column Transformation a0,j b0,j a0,0 a0,1 a0,2 a0,3 Mix- Column b0,0 b0,1 b0,2 b0,3 a1,j b1,j a1,0 a1,1 a1,2 a1,3 b1,0 b1,1 b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a2,j b2,j a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 a3,j b3,j 4x4 Byte Matrix State State 17/34 IASTED-PDCS November, 2003

  32. The Add-Round-Key Transformation • Xor each state byte with each key byte.. a0,0 a0,1 a0,2 a0,3 k0,0 k0,1 k0,2 k0,3 b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 ai,j a1,2 a1,3 k1,0 ki,j k1,1 k1,2 k1,3 b1,0 b1,1 bi,j b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 k2,0 k2,1 k2,2 k2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 k3,0 k3,1 k3,2 k3,3 b3,0 b3,1 b3,2 b3,3 State Key State XOR 18/34 IASTED-PDCS November, 2003

  33. 1 processor t(n) = O(Nb) The Add-Round-Key Transformation a0,0 a0,1 a0,2 a0,3 k0,0 k0,1 k0,2 k0,3 b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 ai,j a1,2 a1,3 k1,0 ki,j k1,1 k1,2 k1,3 b1,0 b1,1 bi,j b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 k2,0 k2,1 k2,2 k2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 k3,0 k3,1 k3,2 k3,3 b3,0 b3,1 b3,2 b3,3 State Key State XOR 18/34 IASTED-PDCS November, 2003

  34. 4 x Nb processors t(n) = O(1) The Add-Round-Key Transformation a0,0 a0,1 a0,2 a0,3 k0,0 k0,1 k0,2 k0,3 b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 ai,j a1,2 a1,3 k1,0 ki,j k1,1 k1,2 k1,3 b1,0 b1,1 bi,j b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 k2,0 k2,1 k2,2 k2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 k3,0 k3,1 k3,2 k3,3 b3,0 b3,1 b3,2 b3,3 State Key State XOR 18/34 IASTED-PDCS November, 2003

  35. The Round Transformation For i = 1 to Nr – 1 State  ByteSub(State) State  ShiftRow(State) State  MixColumn(State) State  AddRoundKey(State, Key) Final Round: State  ByteSub(State) State  ShiftRow(State) State  AddRoundKey(State, Key) 19/34 IASTED-PDCS November, 2003

  36. Sequential Model p(n) = 1 t(n) = O(Nb x Nr) Fully Parallel Model p(n) = O(Nb) t(n) = O(Nr) s(n) = O(Nb) c(n) = O(Nb x Nr) The Round Transformation We have achieved cost-optimality! 20/34 IASTED-PDCS November, 2003

  37. Key Expansion Algorithm For j = 1 to Nk W[j] = (Key[4j],Key[4j+1],Key[4j+2],Key[4j+3]) For j = Nk+1 to Nb x (Nr+1) temp = W[j-1] if( j % Nk = 0 ) temp = SubByte(RotByte(temp)) ^ Rcon[j/Nk] else if( Nk > 6 && j % Nk == 4 ) temp = SubByte(temp) W[j] = W[j-Nk] XOR temp 21/34 IASTED-PDCS November, 2003

  38. Nk iterations Nb x (Nr + 1) - Nk iterations Total: Nb x (Nr + 1) iterations 1 processor t(n) = O(Nb x Nr) Key Expansion Algorithm on a Uniprocessor (Sequential) Machine Basic Algorithm Structure: For j = 1 to Nk { … } For j = Nk+1 to Nb x (Nr+1) { … } 22/34 IASTED-PDCS November, 2003

  39. Key Expansion Algorithm on a Parallel Machine • The loop-carried dependence appears to render the algorithm impossible to parallelize… For j = Nk+1 to Nb x (Nr+1) temp = W[j-1] … W[j] = W[j-Nk] XOR temp 23/34 IASTED-PDCS November, 2003

  40. Key Expansion Algorithm on a Parallel Machine • … Observe that XOR is a binary associative operation. For j = Nk+1 to Nb x (Nr+1) temp = W[j-1] … W[j] = W[j-Nk] XOR temp 23/34 IASTED-PDCS November, 2003

  41. Key Expansion Algorithm on a Parallel Machine • This algorithm is simply a variant of Prefix Sum with XOR instead of +. For j = Nk+1 to Nb x (Nr+1) temp = W[j-1] … W[j] = W[j-Nk] XOR temp 23/34 IASTED-PDCS November, 2003

  42. Key Expansion Algorithm • To compute the prefix sum cost-optimally: 24/34 IASTED-PDCS November, 2003

  43. Round Key Selection • Bytes W[Nb x i] through W[Nb x (i+1) – 1] are chosen to be the key bits for round i. • Can be interleaved with the Key Expansion phase with no additional overhead. W[1..Nb-1] W[Nb..2Nb-1] … W[NbNr..Nb(Nr+1)-1] 25/34 IASTED-PDCS November, 2003

  44. Key Schedule • Sequential Algorithm • Parallel (Prefix-Sum) Algorithm 26/34 IASTED-PDCS November, 2003

  45. Key Schedule Round Transformation The Rijndael Cipher: Sequential Model Overall 27/34 IASTED-PDCS November, 2003

  46. The Rijndael Cipher: Parallel Model Key Schedule Round Transformation 28/34 IASTED-PDCS November, 2003

  47. The Rijndael Cipher: Parallel Model Altogether This model does NOT yield a cost-optimal solution! 29/34 IASTED-PDCS November, 2003

  48. Achieving Cost Optimality with a Parallel Model of Computation • Reduce the number of processors from • The Round Transformation requires time • The Key Schedule requires time 30/34 IASTED-PDCS November, 2003

  49. Achieving Cost Optimality • Final Results: • Speedup and Cost: 31/34 IASTED-PDCS November, 2003

  50. Fastest Model Cost-Optimal Model Summary of Results 32/34 IASTED-PDCS November, 2003

More Related