310 likes | 504 Views
Digital signature using MD5 algorithm Hardware Acceleration. Final Presentation. Students: Eyal Mendel & Aleks Dyskin Instructor: Evgeny Fiksman. High Speed Digital Systems Laboratory. Agenda. Introduction. HW/SW System Design. Performance Evaluation. Conclusions & Summary. Agenda.
E N D
Digital signature using MD5 algorithm Hardware Acceleration Final Presentation Students:Eyal Mendel & Aleks Dyskin Instructor: Evgeny Fiksman High Speed Digital Systems Laboratory
Agenda Introduction HW/SW System Design Performance Evaluation Conclusions & Summary
Agenda Introduction HW/SW System Design Performance Evaluation Conclusions & Summary
Project Goals Introduction Hardware Accelerator Design & Implementation Evaluation C to FPGA technique Study case: MD5 algorithm Tool: ASC – A Stream Compiler
MD5 Goals/Usage Introduction Goal: The MD5 (Message Digest 5)algorithm is intended for digital signature applications, where a large file must be "compressed" in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem Usage: MD5 is widely used as cryptographic hash function . As an internet standard RFC1321, MD5 has been employed in wide variety of security applications, commonly used to check the integrity of files.
MD5 steps (1) Introduction Step 1: Append Padding Bits The message is "padded" so that its length (in bits) is congruent to 448, modulo 512. Step 2: Append Length A 64-bit representation of b (the length of the message before the padding bits were added) is appended to the result of the previous step.
MD5 steps (2) Introduction Step 3: Initialize MD buffer a=0x67452301; b=0xefcdab89; c=0x98badcfe; d=0x10325476 Step 4-5: Process message in 16-word blocks and Output
ASC Overview Introduction • ASC (A Stream Compiler) simplifies exploration of hardware accelerators by transforming the hardware design task into a software design process using only ’gcc’ and ’make’ to obtain a hardware netlist. • Single C++ program with custom types and operators is the only syntax needed. • ASC provides all the environment and implements all the protocols needed to communicate between HW module and CPU.
SW Model Evaluation(1) Introduction Accelerated Part • Maximum speed up in ideal case is: (process and speed_up takes 0 sec to evaluate) • The evaluation for the finish stage was done for the worst case: i.e. the append_bits step is performed. • In general case the append_bits is performed only once per file/string. • All the measurements were held on Xilinx PowerPC
SW Model Evaluation(2) Introduction For huge chunks amount the total speed up will be: • Where: • n is number of chunks • Tsw1,Thw1 is average time of not_last chunk execution • Tsw2,Thw2 is average time of the last chunk execution
Agenda Introduction HW/SW System Design Performance Evaluation Conclusions & Summary
System High-Level SW/HW System Design This module serves as input/output of the system, starting and finishing the process. Manages MD5 hardware interface. Serial communication manager between PC and M310 board Step 4 implementation SW reference module for comparison
SW/HW algorithm flow SW/HW System Design
HW Accelerator insights SW/HW System Design Basic structure of the hardware module after the initial design “on paper” :
Processing Unit SW/HW System Design Detailed explanation of one process cycle : Problem- which result is relevant for given ‘i’. The process cycle is being run 16 times per 512 bit input (32bit*16=512bit)
Function Masking SW/HW System Design
T-Table access(1) SW/HW System Design ? Every process cycle we need to fetch 32X4=128bits from the T-table a. Problem: ASC supports only 32bit wide memories b. Using 2-port BRAM result in 2 clock cycles
T-Table Access (2) SW/HW System Design
Agenda Introduction HW/SW System Design Performance Evaluation Conclusions & Summary
HW Module Performance Performance Evaluation One data process of 512 bits takes: 680ns (@clock_freq=100MHz) S_CYCLE=4 clock cycles S_ LOOP = 16+1
Measurements (1) Performance Evaluation All times are in usec Finish_SW=append Bits_SW+Process_SW+Output_SW Finish_HW=append Bits_SW+Process_HW+Output_SW Average speed-up HW-SW = 1.34998 times
Measurements (2) Performance Evaluation All times are in usec
Agenda Introduction HW/SW System Design Performance Evaluation Conclusions & Summary
Conclusions(1) Conclusions & Summary • x1.35 Speedup with HW implementation (Worst Case). • The expected Speed Up in ideal case for one chunk is: • The theoretical speedup of larger than 1.35can be achieved with large data chunks, • when append_bit is evaluated only for the last chunk. In that case the ideal speed up • of 2.83 is expected, but in reality the speed up of ~ 2.75 is reached from • measurments (graph next slide) • ASC tool proved the ability to implement complicated hardware modules with the use • of few software commands and its code is easy_to_read
Conclusions(2) Speed Up Prediction • When: • T1s,T1h is average time of not_last chunk execution • T2s,T2h is average time of the last chunk execution • su2 is speed up for not_last chunk • su1 is speed up for the last chunk • n is number of chunks
Summary Conclusions & Summary • We learned ASC :design approach, debug and synthesize process. • We showed the feasibility of MD5 implementation with ASC • Implementation design of algorithm from pseudo code to hardware • Masking mechanism • Parallel processing and mux-ing the appropriate result • Overcoming over the limitations of hardware by creative approach (memory imp.) • Flow control • Project goals were partially achieved • The File version was not implemented
Further Work Conclusions & Summary • Further acceleration can be reached using pipe line architecture: • File version further development.
The End Thank you for your time.