230 likes | 320 Views
Study of AES Encryption/ Decription Optimizations. EE800 Term Project. Nathan Windels. Outline. Introduction AES Algorithm Areas of Optimization Progress/Conclusion. Introduction. Introduction. Three major implementation methods: Software
E N D
Study of AES Encryption/Decription Optimizations EE800 Term Project Nathan Windels
Outline • Introduction • AES Algorithm • Areas of Optimization • Progress/Conclusion
Introduction Three major implementation methods: • Software • -Typically, this method is much slower than hardware implementations. • FPGA • -Implemented as a hardware module directly to pins. • -Peripheral to a soft-core processor (communicates via on-chip bus). • -Tightly-coupled hardware implemented as an extended instruction set. • Custom Hardware (ASIC)
Introduction (2) • High throughput implementations are mainly used for high-end devices such as accelerator cards for e-commercial service and security trunk communications. • These types of implementations are typically unrolled loops within the AES algorithm with a pipelining of the 128-bit datapath. • Although they typically have a very high throughput, their area is very large.
Introduction (3) • The 32-bit AES implementations mainly multiplex the 128-bit datapath to 32 bits • This reduces circuit area at the expense of lowering speed. • This type of implementation is actually ideal for embedded applications. • My goal is to provide synthesis results for the different implementations as well as simulation/implemented results if time permits.
AES Algorithm: Input to Encryption Process to Key Schedule
AES Algorithm: Data Path From Key Schedule
AES Algorithm: Data Path – Add Key Data Round Key
AES Algorithm: Key Schedule • Without going into too much detail, the Key is generated in a ‘similar’ way. • In each Round a new Round Key is generated from the previous key. • This key is added to the dataset at the end of the round.
Optimization: Key Expansion • Pre-calculated in software and then stored in hardware (loaded when needed) • Low area • Hardware has to wait if new key is introduced (not good for continually changing key) • Calculated in parallel with the corresponding iteration • This allows for a changing key to be calculated on the fly • Extra hardware/area cost (not good for (embedded) fixed key applications) • Calculated in hardware ahead of time and stored • High hardware cost – introduces latency when a new key is introduced • The circuit can be ‘turned off’ in ASIC solution
Optimization: Shift Row • 16x8 memory with shifting ability • 2 shift registers • Rearrangement of wires (requires no extra area, but may cause congestion in the wiring)
Optimization: Substitute Byte • LUT • Easy to implement and understand. Would be a good idea to use the on chip ROM rather than LE’s (depending on application). • Uses lots of resources • Combinational logic • No need for memories (XOR circuit could be good in FPGA as we’ve seen earlier in this class) • Slow due to complex circuit.
Optimization: Mix Columns • Multiplication and XOR done in combinational logic • Easy to implement • Could be slow and cover a large area • Combine the MixCols multiplication with the sbox and leave XOR in the LE’s • Uses very few LE’s. Removes multiplication from the equation. • Quadrupalsthe size of the necessary ROM - could be a drawback
Conclusion: So Far.... • Studied Papers that address several of the optimizations listed above • Decided on an approach to modify and test existing code • Begun modifications on the code that I’ve decided to use as a starting point • ...don’t quite have synthesis results yet...
Papers “Embedded a Low Area 32-bit AES for Image Encryption/ Decryption Application” “Exploring HW/SW Co-Design of AES Algorithm Using Custom Instructions” “Improved Method to Increase AES System Speed” “An AES Tightly Coupled Hardware Accelerator in an FPGA-based Embedded Processor Core” “DSP’s, BRAM’s and Pinch of Logic: New Recipes for AES on FPGA’s”