1 / 28

Digital signature using MD5 algorithm Hardware Acceleration

Digital signature using MD5 algorithm Hardware Acceleration. Final Presentation. Students: Eyal Mendel & Aleks Dyskin Instructor: Evgeny Fiksman. High Speed Digital Systems Laboratory. Agenda. Introduction. HW/SW System Design. Performance Evaluation. Conclusions & Summary. Agenda.

joylyn
Download Presentation

Digital signature using MD5 algorithm Hardware Acceleration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital signature using MD5 algorithm Hardware Acceleration Final Presentation Students:Eyal Mendel & Aleks Dyskin Instructor: Evgeny Fiksman High Speed Digital Systems Laboratory

  2. Agenda Introduction HW/SW System Design Performance Evaluation Conclusions & Summary

  3. Agenda Introduction HW/SW System Design Performance Evaluation Conclusions & Summary

  4. Project Goals Introduction Hardware Accelerator Design & Implementation Evaluation C to FPGA technique Study case: MD5 algorithm Tool: ASC – A Stream Compiler

  5. MD5 Goals/Usage Introduction Goal: The MD5 (Message Digest 5)algorithm is intended for digital signature applications, where a large file must be "compressed" in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem Usage: MD5 is widely used as cryptographic hash function . As an internet standard RFC1321, MD5 has been employed in wide variety of security applications, commonly used to check the integrity of files.

  6. MD5 steps (1) Introduction Step 1: Append Padding Bits The message is "padded" so that its length (in bits) is congruent to 448, modulo 512. Step 2: Append Length A 64-bit representation of b (the length of the message before the padding bits were added) is appended to the result of the previous step.

  7. MD5 steps (2) Introduction Step 3: Initialize MD buffer a=0x67452301; b=0xefcdab89; c=0x98badcfe; d=0x10325476 Step 4-5: Process message in 16-word blocks and Output

  8. ASC Overview Introduction • ASC (A Stream Compiler) simplifies exploration of hardware accelerators by transforming the hardware design task into a software design process using only ’gcc’ and ’make’ to obtain a hardware netlist. • Single C++ program with custom types and operators is the only syntax needed. • ASC provides all the environment and implements all the protocols needed to communicate between HW module and CPU.

  9. SW Model Evaluation(1) Introduction Accelerated Part • Maximum speed up in ideal case is: (process and speed_up takes 0 sec to evaluate) • The evaluation for the finish stage was done for the worst case: i.e. the append_bits step is performed. • In general case the append_bits is performed only once per file/string. • All the measurements were held on Xilinx PowerPC

  10. SW Model Evaluation(2) Introduction For huge chunks amount the total speed up will be: • Where: • n is number of chunks • Tsw1,Thw1 is average time of not_last chunk execution • Tsw2,Thw2 is average time of the last chunk execution

  11. Agenda Introduction HW/SW System Design Performance Evaluation Conclusions & Summary

  12. System High-Level SW/HW System Design This module serves as input/output of the system, starting and finishing the process. Manages MD5 hardware interface. Serial communication manager between PC and M310 board Step 4 implementation SW reference module for comparison

  13. SW/HW algorithm flow SW/HW System Design

  14. HW Accelerator insights SW/HW System Design Basic structure of the hardware module after the initial design “on paper” :

  15. Processing Unit SW/HW System Design Detailed explanation of one process cycle : Problem- which result is relevant for given ‘i’. The process cycle is being run 16 times per 512 bit input (32bit*16=512bit)

  16. Function Masking SW/HW System Design

  17. T-Table access(1) SW/HW System Design ? Every process cycle we need to fetch 32X4=128bits from the T-table a. Problem: ASC supports only 32bit wide memories b. Using 2-port BRAM result in 2 clock cycles

  18. T-Table Access (2) SW/HW System Design

  19. Agenda Introduction HW/SW System Design Performance Evaluation Conclusions & Summary

  20. HW Module Performance Performance Evaluation One data process of 512 bits takes: 680ns (@clock_freq=100MHz) S_CYCLE=4 clock cycles S_ LOOP = 16+1

  21. Measurements (1) Performance Evaluation All times are in usec Finish_SW=append Bits_SW+Process_SW+Output_SW Finish_HW=append Bits_SW+Process_HW+Output_SW Average speed-up HW-SW = 1.34998 times

  22. Measurements (2) Performance Evaluation All times are in usec

  23. Agenda Introduction HW/SW System Design Performance Evaluation Conclusions & Summary

  24. Conclusions(1) Conclusions & Summary • x1.35 Speedup with HW implementation (Worst Case). • The expected Speed Up in ideal case for one chunk is: • The theoretical speedup of larger than 1.35can be achieved with large data chunks, • when append_bit is evaluated only for the last chunk. In that case the ideal speed up • of 2.83 is expected, but in reality the speed up of ~ 2.75 is reached from • measurments (graph next slide) • ASC tool proved the ability to implement complicated hardware modules with the use • of few software commands and its code is easy_to_read

  25. Conclusions(2) Speed Up Prediction • When: • T1s,T1h is average time of not_last chunk execution • T2s,T2h is average time of the last chunk execution • su2 is speed up for not_last chunk • su1 is speed up for the last chunk • n is number of chunks

  26. Summary Conclusions & Summary • We learned ASC :design approach, debug and synthesize process. • We showed the feasibility of MD5 implementation with ASC • Implementation design of algorithm from pseudo code to hardware • Masking mechanism • Parallel processing and mux-ing the appropriate result • Overcoming over the limitations of hardware by creative approach (memory imp.) • Flow control • Project goals were partially achieved • The File version was not implemented

  27. Further Work Conclusions & Summary • Further acceleration can be reached using pipe line architecture: • File version further development.

  28. The End Thank you for your time.

More Related