230 likes | 379 Views
Dynamic High-Performance Multi-Mode Architectures for AES Encryption. Eric Swankoski Naval Research Lab Vijay Narayanan Penn State University. Background & Motivation. Bandwidth and throughput capabilities of modern optical networks is skyrocketing
E N D
Dynamic High-Performance Multi-Mode Architectures for AES Encryption Eric Swankoski Naval Research Lab Vijay Narayanan Penn State University Swankoski MAPLD 2005 / B103
Background & Motivation • Bandwidth and throughput capabilities of modern optical networks is skyrocketing • Protecting transmitted data becoming more and more critical • Current encryption architectures generally aren’t capable of keeping up with high-speed environments • SEU effects rarely, if ever, considered Swankoski MAPLD 2005 / B103
Plan of Attack: FPGA Encryption • Algorithm: Advanced Encryption Standard (AES) • Supports multiple key lengths • Supports multiple encryption modes • Supports multiple levels of pipelining • Target Architecture: Xilinx FPGAs • Can be adapted to ASIC devices • Virtex-II, Virtex-4 • Target Performance: 60+ gigabits per second • Requires both inner-round and outer-round pipelining Swankoski MAPLD 2005 / B103
The AES Algorithm • 10 Rounds of Encryption for 128-bit operands • Four basic operations: • SubBytes: • 8-bit substitution (16 parallel operations per round) • ShiftRows: • Byte reordering and rotation (4 parallel operations per round) • MixColumns: • Polynomial multiplication (4 parallel operations per round) • AddRoundKey • Simple 128-bit XOR Swankoski MAPLD 2005 / B103
Optimizing for Performance • Exploit all possible parallelism • Alternative byte substitution methods • 1 cycle for a lookup-based substitution • 5 cycles for a mathematical transformation • Utilize pipelining • Outer-Round: 1 cycle per round • Inner-Round: • 4 cycles per round (lookup-based byte substitution) • 8 cycles per round (pipelined byte substitution) Swankoski MAPLD 2005 / B103
Combinatorial Byte Substitution • Actual mathematical transformation • Conventional implementation cannot be pipelined • Simple (atomic) 8x8 lookup table • Smaller than lookup table • Faster than lookup table • Utilizes five-stage pipeline • All internal operands are four bits wide Swankoski MAPLD 2005 / B103
Encryption Round Diagram • Atomic S-Box: • 40 Pipeline Stages • Combinatorial S-Box: • 76 Pipeline Stages • Needs a constant stream to be effective • Parallel Key Scheduling • No performance penalty • Offline Key Scheduling • Precomputed keys can be stored in registers Swankoski MAPLD 2005 / B103
Counter (CTR) Mode • Effectively converts AES into a stream cipher • High security – similar to CBC • Supports inner-round and outer-round pipelining • No error propagation – errors are completely isolated Swankoski MAPLD 2005 / B103
Cipher Block Chaining (CBC) Mode • Most secure – no patterns are observed • Cannot be pipelined • 100% downstream corruption resulting from data loss or single-event upsets (SEUs) during encryption • Errors are isolated during decryption Swankoski MAPLD 2005 / B103
Electronic Codebook (ECB) Mode • Supports full pipelining • No error propagation – errors are completely isolated • Least secure – identical input gives identical output • Patterns observable in video and image data Swankoski MAPLD 2005 / B103
Staggered CBC Mode • Pipelined with Output Feedback • Each encrypted block n depends on itself and the block (n – x) where x is the latency of the pipeline • Maintains security while mitigating some error propagation problems Swankoski MAPLD 2005 / B103
More Challenges • Error-Tolerant Encryption • Maintaining High Security • Maintaining High Performance Swankoski MAPLD 2005 / B103
Error-Tolerant Encryption • Are errors acceptable? • Possibly, but better to assume not • How do the multiple modes of encryption deal with upsets? • Is there a benefit to triple modular redundancy (TMR)? • Is it what we expect? Swankoski MAPLD 2005 / B103
Error-Tolerant Encryption • CTR and ECB encryption isolate errors • Transmission integrity largely preserved even without SEU mitigation • TMR can ensure 100% transmission integrity • TMR REQUIRED for CBC encryption Swankoski MAPLD 2005 / B103
Error-Tolerant Encryption • Image 1: Error-Free Plaintext Image • Before Encryption / After Decryption • CTR, ECB, or CBC with mitigation • Image 2: Decrypted Plaintext Image • One corrupted block • CTR or ECB without mitigation • Image 3: Decrypted Plaintext Image • One block corrupted during encryption • CBC without mitigation Swankoski MAPLD 2005 / B103
Maintaining High Security • How do the multiple modes of encryption affect security? • Is physical protection of the key necessary? • Depends on the environment • How is throughput affected by increased security? • Hopefully, not at all… Swankoski MAPLD 2005 / B103
Maintaining High Security • ECB-encrypted image has observable patterns • CTR/CBC/SCBC encryption looks like random noise Swankoski MAPLD 2005 / B103
Maintaining High Security • Physical Key Protection • Not required in aerospace applications • Power Analysis / Soft Attacks • Countermeasures not mode specific • Throughput Effects • ECB & CTR far outperform CBC • Why is CBC an official mode? Swankoski MAPLD 2005 / B103
System-Level Diagram • Supports ECB, CTR, CBC, and SCBC modes • Supports two types of TMR • System: triplicates all control, key hardware, and mode logic • Encryption: triplicates only encryption and key scheduling hardware Swankoski MAPLD 2005 / B103
Performance Results – Virtex-4 • Key Scheduling • Offline uses precomputed and stored keys (compile or design time) • Online uses dynamically computed keys (run time) • Significant performance improvement for combinatorial byte substitution in pipelined mode • Virtex-II Pro performs better with ROM implementation (56.42 & 60.35 Gbps) • Better CBC performance achieved through other architectures Swankoski MAPLD 2005 / B103
Lessons Learned • Don’t try to over-optimize FPGA code • Returns diminish quickly • Sometimes less is more • Know your synthesis tool • Now why did it do THAT? • Check your system’s memory • RAM does fail at inopportune times… • ESPECIALLY if it has a lifetime warranty Swankoski MAPLD 2005 / B103
Lessons Learned • Over-optimization • In a highly pipelined FPGA design, routing plays a MAJOR role in the clock frequency • 70%-80% of the total delay • What would work in an ASIC (or in theory, or on paper…) might actually make things worse • Manual floorplanning and P&R might help, but usually provides minimal (if any) improvement • Moral? – Try reducing the pipeline depth as well as increasing it, it just might help! Swankoski MAPLD 2005 / B103