1 / 24

Concurrent Error Detection Architectures for Symmetric Block Ciphers

Concurrent Error Detection Architectures for Symmetric Block Ciphers. Ramesh Karri, Kaijie Wu, Y. Kim and P. Mishra CAD Lab Department of Electrical Engineering Polytechnic University ( ramesh@india.poly.edu , kwu03@utopia.poly.edu , ykim01@utopia.poly.edu , pmishr01@utopia.poly.edu ).

kaycee
Download Presentation

Concurrent Error Detection Architectures for Symmetric Block Ciphers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concurrent Error Detection Architectures for Symmetric Block Ciphers Ramesh Karri, Kaijie Wu, Y. Kim and P. Mishra CAD Lab Department of Electrical Engineering Polytechnic University (ramesh@india.poly.edu, kwu03@utopia.poly.edu, ykim01@utopia.poly.edu, pmishr01@utopia.poly.edu)

  2. Purpose • Investigate systematic approaches to low-cost, low-latency Concurrent Error Detection (CED) schemes for symmetric block ciphers

  3. Outline • Describe symmetric block ciphers • structure • operation • inverse properties at various levels • Discuss Advanced Encryption Standard (AES) candidate algorithms • Review hardware and time redundancy based CED approaches • Present encryption-decryption-inversion based CED approach • Implementation results • Case study • Conclusion

  4. Symmetric Block Ciphers • Basic iterative looping structure • Optional pre and post processing • Each round uses multiple operations and round key(s) • Decryption process comprises of applying the same or inverse operations in a reverse order

  5. 128 Bit Symmetric Block Cipher 128-bit plain text 128-bit plain text Pre round Pre round Operation 1 Operation 1 Round 1 Operation 2 Round key 1 Round r Operation i-1 Operation i Operation i Decryption Encryption Operation 1 Operation 1 Operation 2 Round r Round key r Round 1 Operation i-1 Operation i Operation i Post round Post round 128-bit cipher text 128-bit cipher text

  6. Operations used in 128-bit Symmetric Block Ciphers RC6  (mod232) 5-bit rotation exclusive or variable rotation + (mod 232) (key) Rijndael s-box Fixed rotation (GF(28)) exclusive or (key) Serpent exclusive or (key) S-box exclusive or (linear transform) Twofish s-box (GF(28)) + (mod 232) + (mod 232) (key) exclusive or 1-bit rotation MARS s-box +,(mod 232) (key) exclusive or variable rotation 5-bit,13-bit rotation - (mod 232) s-box: Non-linear bit-wise substitution GF: Galois field

  7. Hardware architecture for Symmetric Block Ciphers • An encryption device consists of: • an encryption module • a decryption module • a key ram • an input and an output data port • Only one operation, either encryption or decryption, is performed at a time and hence single key-ram and single I/O data ports

  8. CED architectures for Symmetric Block Ciphers - Motivation • Fault-based attacks use radiation or some other external source to introduce errors into an encryption device • Then inputs are applied to the faulty device and its outputs observed to obtain the keys stored in the device or discover the implementation structure • CED followed by suppression of output on fault is an effective approach against such fault-based side-channel cryptanalysis

  9. CED architectures for Symmetric Block Ciphers • Straightforward duplication of basic hardware followed by comparison • Minimum detection latency • Duplication of basic architecture  100% area overhead • Re-computation using the basic hardware followed by comparison • Minimum area overhead •  100% time overhead • Only transient-fault detection capability • Proposed approach: Exploits the inverse relationship between encryption and decryption •  40 % area overhead • low fault-detection latency

  10. Encryption-decryption Inverse relationship Plain text Plain text Pre round Pre round Operation 1 Operation 1 Operation 2 Round 1 Round key 1 Round r Operation i-1 Operation i Operation i Decryption Encryption Operation 1 Operation 1 Operation 2 Round r Round key r Round 1 Operation i-1 Operation i Operation i Post round Post round Cipher text Cipher text • Plain text = Decryption(Encryption (plain text, key), key) • Cipher text = Encryption (Decryption(cipher text, key), key) • True at the algorithm level, round level and operation level

  11. Approach 1: Algorithm Level CED Plain text • Output of encryption is fed to decryption; result is compared with original plain text • Low area overhead: a 128-bit register, four (2:1) 128-bit multiplexers and a 128-bit comparator • 100 % performance penalty (time for encryption = 2  # of rounds (r)  cycles per round (n)) • Large fault detection latency (2  # of rounds (r)  cycles per round (n)) Round 1 Register Round 1 Round 2 Round 2 Encryption module Decryption module Round r Round r Comparator Random value To output data port

  12. Approach 2: Round Level CED Plain text ENC round 1 Register • Output of an encryption round is fed to corresponding decryption round and the result is compared with the input to the encryption round • Larger area overhead: a 128-bit register, two (3:1) 128-bit multiplexers, two (2:1) 128-bit multiplexer and a 128-bit comparator • Encryption and decryption for CED can be carried out concurrently • Lower performance penalty (time for encryption = # of cycles for encryption + # of cycles for one round of decryption) • Additional delays in the critical path leading to slower clock • Low fault detection latency (2  cycles per round) ENC round 2 DEC round n Comparator Register ENC round n Cipher DEC round 1 Comparator Random value To output data port

  13. Approach 3: Operation Level CED Plain text Register Operation 1 • Output of encryption module’s operation is fed to the decryption module’s respective inverse operation and the result is compared with the input to the encryption module’s operation • Largest area overhead of the three: multiple 128-bit registers, 128-bit multiplexers and 128-bit comparators with complex inter-connections • Encryption and decryption for CED can be carried out concurrently • Lowest performance penalty (time for encryption = # of cycles for encryption + # of cycles for one operation of decryption) • Lowest fault detection latency (2 cycles per operation of encryption/decryption) • Maximum delay in the critical path leading to slowest clock Operation 2 Decryption round n-r+1 Operation m Encryption round r Register Comparator Operation m-1 Operation m Operation 1 Intermediate cipher Comparator

  14. W/o CED Algorithm-level CED Round-level CED Operation-level CED Algorithm Detec Ltncy cycles Detec Ltncy cycles Detec Ltncy cycles Enc cycles Enc cycles Enc cycles Enc cycles RC6 42 84 84 44 4 43 2 Mars 32 64 64 34 4 33 2 Serpent 64 128 128 66 4 65 2 Twofish 34 68 68 36 4 35 2 44 Rijndael 88 88 48 8 45 2 Comparison of 128-Bit Symmetric Block Ciphers

  15. FPGA Implementation and Validation • Xilinx Virtex device, XCV1000BG560-6 • VHDL modeling • Functional verification: Modeltech’s Modelsim VHDL simulator • Synthesis: Synplify • Place and route: Xilinx Foundation PAR tool

  16. Implementation Metrics • Area = No. of Virtex slices used • Each Virtex slice = Two lookup tables • Each lookup table can implement 4 i/p- 1 o/p logic function • Throughput = • Performance degradation = 1-

  17. Area (# of slices) Over head (%) 5100 28.37 3153 31.5 3467 6.28 9659 19.15 Implementation Results: Area Overhead w/o CED Algorithm level Round level Operation level Area (# of slices) Area (# of slices) Over head (%) Area (# of slices) Over head (%) Rijndael 3973 4806 20.97 * * RC6 2397 3028 26.3 3337 39.20 Twofish 3262 3474 6.49 ** ** Serpent 8073 9376 16.14 9974 23.55 * Result not available ** Result not applicable • Increasing granularity of CED increases area overhead • Decrease in fault detection latency/increase in area overhead is more significant between algorithm level and round level than between round level and operation level CED

  18. w/o CED Algorithm level Round level Operation level Max freq (MHz) Max freq (MHz) Dgrd (%) Max freq (MHz) Dgrd (%) Max freq (MHz) Dgrd (%) Rijndael 46.93 36.44 -22.35 36.06 -23.17 * * RC6 23.99 21.76 -9.30 20.74 -13.54 16.87 -29.70 Twofish 20.16 18.98 -5.85 19.07 -5.41 ** ** Serpent 28.64 30.37 6.04 26.27 -8.08 26.759 -6.56 Implementation Results: Clock period degradation • Increasing granularity of CED decreases clock frequency • Decrease in fault detection latency/decrease in clock frequency is more significant between algorithm level and round level than between round level and operation level CED * Result not available ** Result not applicable

  19. w/o CED Algorithm level Round level Operation level Through-put (Mbps) Through-put (Mbps) Dgrd (%) Through-put (Mbps) Dgrd (%) Through-put (Mbps) Dgrd (%) 136.53 53.04 -61.15 96.16 -29.57 * * Rijndael 73.11 33.16 -54.64 60.33 -17.48 50.22 -31.31 RC6 75.90 35.73 -52.92 67.80 -10.67 ** ** Twofish 57.28 30.37 -46.98 50.95 -11.05 52.69 -8.01 Serpent Implementation Results: Performance degradation • Increasing granularity of CED decreases throughput • Decrease fault detection latency/decrease in throughput is more significant between algorithm level and round level than between round level and operation level CED * Result not available ** Result not applicable

  20. Case Study:128-bit Serpent Cipher • 32 rounds, each using one round key,except for the last one which uses two; and a pre- and post- processing step • Operations in a round of encryption: Key-Xor  Non-linear byte substitution (S-Box)  Linear transform • Operations in a round of decryption: Inverse linear transform  Inverse non-linear byte substitution(S-Box-1)  Key-Xor • In our implementation: • Round keys are generated and stored in a key RAM (128*33 bit) • One round of encryption consumes 2 cycles • Entire encryption process consumes = 2 * 32 = 64 cycles

  21. Serpent: Algorithm Level CED Plain text Pre-whitening Algorithm level: • Area overhead: a 128-bit register, three (2:1) 128-bit multiplexers and a 128-bit comparator • Fault detection latency: 64 cycles for encryption + 64 cycles for decryption = 128 cycles Register Round 1 Round 1 Round 2 Round 2 Decryption module Encryption module Round 32 Round 32 Post-whitening Comparator

  22. Serpent: Round Level CED Plain text Pre-whitening Decryption round output Round level: • Area overhead: a 128-bit register, two (3:1) 128-bit multiplexers, a (2:1) 128-bit multiplexer and a 128-bit comparator • Fault detection latency: 2 cycles/round for encryption + 2 cycles/round for decryption = 4 cycles Register Inverse Linear Transform Key-xor S-Box Decryption Round Encryption Round S-Box-1 Linear Transform Key-xor Post-whitening Comparator

  23. Serpent: Operation Level CED Plain text Decryption round output Pre-whitening Operation level: • Area overhead: multiple 128-bit registers, 128-bit multiplexers and 128-bit comparators with complex inter-connections • Fault detection latency: 2 cycles Register Key-Xor Register Key-Xor S-Box Encryption Round Register Comparator Linear Transform S-Box-1 Register Inverse Linear Transform Comparator Comparator

  24. Conclusions • Hardware redundancy based approach requires more than 100% area overhead • Time redundancy based approach requires more than 100% time overhead • Proposed CED approach provides better fault detection latency with significantly smaller area overhead ( 40 %) • As we move from high level to low level CED, we get: • better fault detection latency, • lower throughput, and • higher area overhead • Round level CED balances this trade-off better than algorithm level and operation level CED techniques and hence is a better choice

More Related