390 likes | 623 Views
Final presentation Encryption/Decryption on embedded system. Winter 2013 Part A. Supervisor : Ina Rivkin students : Chen P onchek Liel Shoshan. motivation.
E N D
Final presentationEncryption/Decryption on embedded system Winter 2013 Part A Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan
motivation • Now days, there are many portable storage systems with large memories which contains valuable data (such as disk on key, tablets, etc.) • Therefore there is a concrete need for portable cryptography systems which are suitable for such devices. • In our project, we will aspire to provide a suitablesystem which will answer this need.
Project Goal • main goal: • Implementation of data cryptography embedded system using AES algorithm and finding the suitable architecture for portable system.
Project Specifications • Implementing on a ZyncSoPC by Xilinx. • Suitable for portable systems(Disk-on-Key, tablets, etc.) - low power system. • Transparent system (while storing/loading files) - The cryptography system won’t create traffic bottle necks. • Finding the best architecture – according to the requirements above: • Profiling AES algorithm. • Finding the balance between using the ARM processor and using the FPGA (the hardware accelerator needs more power).
AES algorithm • Advanced Encryption Standard, also known as “Rijndael”, is a block cipher. • The cipher is iterative, quick and comfortable to implement both by software and hardware, and it doesn’t have high memory requirements. • Most of the AES calculations are made through 10 rounds. • In each state the data block is described as a 2D, 4X4 array of bytes. • In each round a “Round Key” is created by the key-expansion process. • Each round consists of 4 steps: • SubBytes • ShiftRows • MixColumns • AddRoundKey
System Block diagram PS PL Decrypted data DDR Zynq Encrypted data RS232 BRAM UART ZEDBOARD
System Block Diagramproject part A PS PL Zynq Decrypted data DDR Encrypted data AES in software BRAM RS232 UART ZEDBOARD Implementation of AES algorithm on ARM and code optimization.
Software Engineering • Each step is implemented as a separate function. • Each function is independent of the other functions. • The program can encrypt and decrypt the data.
Software Engineering • The input data will be entered by the user via PuTTYterminal. • The program’s output is the data after encryption and the encrypted data after decryption.
Development stages XPS/EDK- Configuring the ARM system: • Creation of the ARM processor interface to the RS-232 UART. • Addition of the Bram and Bram Controller IP and connection to the AXI Interconnect.
Development stages PlanAhead • Creation of the Top level entity in VHDL code. • Generation of the Bitstream. • Exporting hardware to SDK.
Development stages SDK - • Generating the software platform project: • Creating Board Support package (BSP). • Selection of memory – DDR vs. Bram. • Test in Hardware: • Downloading the application to the ARM processor. • Running and profiling the application.
ProfilingBram vs. DDREncryption and decryption of 10x16 Bytes 111.54 ms 2.754 ms
Software optimization #1 • The MixColumns and InvMixColumns functions takes around 65%-70% of the whole process execution time. • Improving them will significantly reduce the delay time.
Software optimization #1 • We will implement the MixColumns function using LUTs instead of arithmetic commands and if/else statements. • Should speed up the calculations.
ProfilingBram vs. DDRWith an improvedMixColumns implementation 88.06 ms 2.626 ms
Software optimization #1 • Bram : • The total execution time decreased from 111.5 msec to 88 msec. • Decreasing in 21%. • DDR : • The total execution time decreased from 2.754 msec to 2.626 msec. • Decreasing in 5%.
Software optimization #2 • We will implement the MixColumns and the InvMixColumns functions using LUTs and without using for loops. • Should speed up even more the calculations.
ProfilingBram vs. DDRWith an improvedMixColumns implementation With an optimized MixColumns and MixColumns implementation 1.145 ms 47.427 ms
Software optimization #2 • Bram : • The total execution time decreased from 111.5 msec to 47.427 msec. • Decreasing in 57%. • DDR : • The total execution time decreased from 2.754 msec to 1.145 msec. • Decreasing in 58%.
Hardware optimization The ARM processor clock: • At first, we used the default clock rate, whichwas 160MHz. • We will now set the clock rate to 225MHz (the maximum clock rate).
ProfilingBram vs. DDR With higher clock rate (160MHz 225MHz) 0.819 ms 34.798 ms
Hardware optimization • Bram : • The total execution time decreased from 111.5 msec to 34.8 msec. • Decreasing in 69%. • DDR : • The total execution time decreased from 2.754 msec to 0.82 msec. • Decreasing in 70%.
Optimizations Execution’s time improvement
Optimizations Execution’s speed improvement
Execution’s speed improvement • Every optimization that we have made has decreased the total time and improved the speed. • The most significant improve was attributed by the 2nd SW optimization. • Both DDR and Bram speeds were eventually increased by 3 times and more.
Bram vs. DDR • In every optimization : running the application from BRAM wassignificantly slower then running from DDR. • This is due to: • DDR has it own dedicated Bus. • The DDR clock rate is 550 MHZ, when BRAM clock rate is 160 MHZ. • DDR works on both rising and falling edge.
Transmission rate • The typical maximum data rate in USB is 1.5 MB/s (The typical rates are around 0.5MB/s.) • The encryption rate we were able to achieve at the end is 323 KB/s 1.5 times slower. • Conclusion: An hardware accelerator is needed.
Project Specifications • Implementing on a ZyncSoPC by Xilinx. • Suitable for portable systems(Disk-on-Key, tablets, etc.) - low power system. • Finding the best architecture – according to the requirements above: • Profiling AES algorithm.