960 likes | 1.16k Views
Kris Gaj Electrical and Computer Engineering George Mason University. Towards secure cryptographic transformations efficient in both software and hardware: A case for synergy among math, computing, and engineering. http://ece.gmu.edu/crypto-text.htm. Motivation.
E N D
Kris Gaj Electrical and Computer Engineering George Mason University Towards secure cryptographic transformations efficient in both software and hardware: A case for synergy among math, computing, and engineering http://ece.gmu.edu/crypto-text.htm
Criteria used to evaluate cryptographic transformations Security Hardware Efficiency Software Efficiency Flexibility
Flexibility • Additional key-sizes and block-sizes • Ability to function efficiently and securely in a wide • variety of platforms and applications • low-end smartcards, wireless: small memory requirements • IPSec, ATM – small key setup time in hardware • B-ISDN, satellite communication – large encryption speed
Advanced Encryption Standard (AES) Contest 1997-2001 June 1998 Round 1 15 Candidates from USA, Canada, Belgium, France, Germany, Norway, UK, Israel, Korea, Japan, Australia, Costa Rica Security Software efficiency Flexibility August 1999 Round 2 5 final candidates Security Mars, RC6, Rijndael, Serpent, Twofish Hardware efficiency October 2000 1 winner: Rijndael Belgium
Europe NESSIE Project New European Schemes for Signatures, Integrity, and Encryption 2000-2002 Japan CRYPTREC Project 2000-2002
NESSIE, CRYPTREC Multiple types of transformations: • Symmetric-key block ciphers • Stream ciphers • Hash functions • MACs • Asymmetric encryption schemes • Asymmetric digital signature schemes • Asymmetric identification schemes Development of methodology of a fair evaluation and comparison of algorithms belonging to the same class, including software and hardware efficiency
Speed of the final AES candidates in hardware K.Gaj, P. Chodowiec, AES3, April, 2000 Speed [Mbit/s] 500 450 400 350 300 250 200 150 100 50 0 Mars RC6 Serpent Rijndael Twofish
Survey filled by 167 participants of the Third AES Conference, April 2000 # votes 100 90 80 70 60 50 40 30 20 10 0 Mars RC6 Rijndael Serpent Twofish
Results of the NSA group Hardware Speed [Mbit/s] 700 NSA ASIC GMU FPGA 606 600 500 431 414 400 300 202 177 200 143 105 103 61 100 57 0 RC6 Twofish Mars Rijndael Serpent
Efficiency in software: NIST-specified platform 200 MHz Pentium Pro, Borland C++ Speed [Mbits/s] 128-bit key 192-bit key 256-bit key 30 25 20 15 10 5 0 Mars Rijndael Serpent RC6 Twofish
NIST Report: Security Security Margin MARS High Serpent Twofish Rijndael Adequate RC6 Simple Complex Complexity
Security: Theoretical attacks better than exhaustive key search Serpent 9 32 23 10 Twofish 16 6 Mars 16 5 11 without 16 mixing rounds Rijndael 3 10 7 5 20 RC6 15 0 5 10 15 20 25 30 35 # of rounds in the attack/total # of rounds
Security: Theoretical attacks better than exhaustive key search 28% 72% Serpent 38% 62% Twofish Mars 31% 69% 70% 30% Rijndael RC6 25% 75% 0 10 20 30 40 50 60 70 80 90 100 # of rounds in the attack/total # of rounds 100%
Security and hardware speed for hash functions GMU team, May 2002 Speed in hardware [Mbit/s] 700 610 600 500 359 400 300 200 100 0 SHA-1 SHA-512 Complexity of the best attack 280 2256 Skipjack AES-256 the same as
What’s more important: software or hardware?
Historical view Secret-key ciphers Hash functions 1970 DES – optimized for hardware DES-based hash functions – optimized for hardware 1980 1990 MD4-family optimized primarily for software Fast Software Encryption: ciphers optimized for software: e.g., RC5, Blowfish, RC4 2000 AES – optimized for software and hardware time
Software or hardware? HARDWARE SOFTWARE security of data during transmission speed random key generation low cost access control to keys flexibility (new cryptoalgorithms, protection against new attacks) tamper resistance (viruses, internal attacks)
Primary efficiency indicators Hardware Software Area Speed Speed Memory Memory Power consumption
Efficiency parameters Latency Throughput = Speed Mi+2 Mi Mi+1 Mi Time to encrypt/decrypt a single block of data Encryption/ decryption Encryption/ decryption Number of bits encrypted/decrypted in a unit of time Ci+2 Ci Ci+1 Ci Block_size · Number_of_blocks_processed_simultaneously Throughput = Latency
What’s more important: Speed or area?
Non-Feedback Cipher Modes ECB, counter
Comparison for non-feedback cipher modes, e.g. Counter Mode - CTR IV+N IV+N-1 IV IV+1 IV+2 . . . E E E E E . . . M2 MN MN-1 M0 M1 C2 CN-1 C3 CN C1 Ci = Mi E(IV+i) for i=0..N
Increasing speed by parallelprocessing Encryption/ decryption unit Encryption/ decryption unit Encryption/ decryption unit Encryption/ decryption unit Encryption/ decryption unit Encryption/ decryption unit
Increasing speed using pipelining Cipher 2 Cipher 1 round 1 round 1 round 2 . . . target clock period, e.g., 20 ns . . . round 10 round 16 block size Speed = target_clock_period
Pipelined operation of the encryption unit clock cycle 8 4 1 2 5 3 6 7 B1 B3 B6 B7 B4 B8 B5 B2 B2 B5 B6 B3 B7 B4 B1 B1 B4 B5 B2 B6 B3 B3 B4 B1 B5 B2 clock cycle 16 12 9 10 13 11 14 15 B10 B11 B14 B15 B12 B16 B9 B13 B9 B10 B5 B6 B3 B7 B8 B4 B8 B9 B4 B5 B2 B6 B7 B3 B7 B8 B11 B12 B9 B13 B6 B10
Encryption in non-feedback modes (ECB, counter) decryption in all modes Speed [Mbit/s] Rijndael 6.4 Gbit/s 7000 6000 Serpent RC6 Mars Twofish 5000 4000 3000 Assuming clock period = 50 MHz 2000 1000 0 0 50000 60000 10000 20000 30000 40000 Area [CLB slices]
Our Results:Full mixed pipelining Virtex FPGA Throughput [Gbit/s] 16.8 18 15.2 16 13.1 12.2 14 12 10 8 6 4 2 0 Serpent RC6 Twofish Rijndael
Our Results:Full mixed pipelining Area [CLB slices] 46,900 50000 dedicated memory blocks, RAMs 45000 40000 35000 30000 21,000 25000 19,700 20000 12,600 15000 80 RAMs 10000 5000 0 Serpent Twofish RC6 Rijndael
NIST Report + GMU Report: Hardware Efficiency Non-feedback cipher modes: ECB, CTR Speed Rijndael Serpent Twofish RC6 Mars High Medium Low Medium Small Large Area
Feedback cipher modes CBC, CFB, OFB
Feedback cipher modes - CBC M3 M1 M2 MN MN-1 . . . IV E E E E E . . . CN C1 CN-1 C2 C3 C1 = E(MiIV) Ci = E(MiCi-1) for i=2..N
Typical Flow Diagram of a Secret-Key Block Cipher Round Key[0] Initial transformation i:=1 Round Key[i] Cipher Round i:=i+1 #rounds times i<#rounds? Round Key[#rounds+1] Final transformation
Basic iterative architecture multiplexer register combinational logic one round
Increasing speed in cipher feedback modes speed loop-unrolling basic architecture k=5 k=4 k=3 k=2 area
GMU Results:Encryption in cipher feedback modes (CBC, CFB, OFB) - Virtex FPGA Throughput [Mbit/s] 500 400 Serpent I8 Rijndael 300 Twofish 200 Serpent I1 RC6 100 Mars 0 1000 3000 0 2000 4000 5000 Area [CLB slices]
NSA Results:Encryption in cipher feedback modes (CBC, CFB, OFB) - ASIC, 0.5 m CMOS Throughput [Mbit/s] 700 600 Rijndael 500 400 300 Serpent I1 200 Mars 100 RC6 Twofish 0 0 5 10 15 20 25 30 35 40 Area [CLB slices]
Decreasing area by resource sharing After Before D1 D0 D0 D1 multiplexer F F F D0’ D1’ D1’ D0’ register register
Resource sharing: Speed vs. Area Throughput - basic architecture - resource sharing basic architecture Area resource sharing
NIST Report + GMU Report: Hardware Efficiency Feedback cipher modes: CBC, CFB Speed High Rijndael Serpent Twofish Medium RC6 Low MARS Medium Small Large Area
Aren’t software and hardware optimizations equivalent?
Efficiency in software: NIST-specified platform 200 MHz Pentium Pro, Borland C++ Speed [Mbits/s] 128-bit key 192-bit key 256-bit key 30 25 20 15 10 5 0 Rijndael Serpent RC6 Twofish Mars
Our Results:Basic architecture - Speed Throughput [Mbit/s] 500 450 400 350 300 250 200 150 100 50 0 Mars RC6 Serpent Rijndael Twofish
Basic atomic operations of secret-key ciphers and hash functions
Atomic operations used in 41 most popular secret-key ciphers (1) B. Chetwynd, MS Thesis, WPI Considered ciphers: Blowfish, CAST, CAST-128, CAST-256, CRYPTON, CS-Cipher, DEAL, DES, DFC, E2, FEAL, FROG, GOST, Hasty Pudding, ICE, IDEA, Khafre, Khufu, LOKI91, LOKI97, Lucifer, MacGuffin, MAGENTA, MARS, MISTY1, MISTY2, MMB, RC2, RC5, RC6, Rijndael, SAFER K, SAFER+, Serpent, SQUARE, SHARK, Skipjack, TEA, Twofish, WAKE, WiderWake
40 35 30 30 25 20 15 10 7 7 10 5 1 0 Major atomic operations used in 41 most popular secret-key ciphers (2) B. Chetwynd, MS Thesis, WPI Variable rotation Modular multi- plication GF(2n) multi- plication Modular inversion S-box
Auxiliary atomic operations used in 41 most popular secret-key ciphers (3) B. Chetwynd, MS Thesis, WPI 40 40 35 30 25 25 20 20 ? 15 10 5 0 Modular addition & subtraction Fixed rotation Permutation Boolean (XOR, AND, OR, etc.)
Major cipher operations (1) - S-box Software Hardware ROM C S-box n x m n-bit address WORD S[1<<n]= { 0x23, 0x34, 0x56 . . . . . . . . . . . . . . } n 2n m bits 2n words S ASM m-bit output m S DW 23H, 34H, 56H ….. direct logic y1 x1 y2 x2 ... ... ym xn
S-box: Memory in hardware 4 4 4 4 4 4 4 S S S S S S S 4 4 4 4 4 4 4 8 8 8 8 S S S S 8 8 8 8 32 x 4 = 128 bits . . . Memory = 32 24 4 bits = 2 kbit 16 x 8 = 128 bits . . . Memory = 16 28 8 bits = 32 kbit = 16 2 kbit