1 / 17

CS343 Project Presentation: Secure Hash Acceleration

CS343 Project Presentation: Secure Hash Acceleration. Winnie Cheng Alvin Cheung Paul Hartke June 4, 2003. Project Overview. Accelerate secure hash cryptography algorithms in common use versus a standalone processor Focus on Md5 and Sha1 Utilize two different implementation methodologies

raziya
Download Presentation

CS343 Project Presentation: Secure Hash Acceleration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS343 Project Presentation:Secure Hash Acceleration Winnie Cheng Alvin Cheung Paul Hartke June 4, 2003

  2. Project Overview • Accelerate secure hash cryptography algorithms in common use versus a standalone processor • Focus on Md5 and Sha1 • Utilize two different implementation methodologies • Tensilica Xtensa Processor and SystemC • Integrate implementations in real application • Open-Source OpenSSL package selected as target application • Utilizes a number of encryption algorithms • Integrate system in operational system versus using synthetic benchmarks

  3. MD5 Basic Round

  4. SHA-1 Basic Round repeated 80 times

  5. Tensilica Processor Extensions • Create compound instructions to perform more of the algorithm per clock cycle • 25 instructions/byte of input data @ 200Mhz clock  64Mbps • Reduce to 5 instructions per cycle • 5 instructions/byte of input data @ 200Mhz clock  320Mbps • 5 cycles from the critical path of the operations using a 200Mhz clock

  6. Custom Instruction Sharing • Sharing between instructions appears attractive • Both algorithm rounds dominated by adder trees, shifts, and logical functions • However, the overlap of actual specific groups of operations was minimal • Results in separate instructions for each algorithm

  7. Architectural Exploration with SystemC • Objective is to take the same source md5/sha1 high level C source code and directly generate a hardware implementation • Then compare to existing hand verilog implementations and extended TIE processor

  8. SystemC Design Flow

  9. Design Analysis Capabilities

  10. SystemC Limitations • Original source code not directly usable by SystemC • Pointers not synthesizable  requires rewrite of original source • Minimal architectural transformations performed • no loop fusion • no automatic loop unrolling exploration

  11. Successive Design Iterations • Iterative flow results required successive source code transformations to achieve better size and area • Scheduling analysis indicated target areas for improvement • Areas of low utilization • Excessive resource dependencies • In the end, final source code gave results close to hand verilog implementation • But final code had very little resemblance with original C source but did resemble hand verilog

  12. SystemC Results

  13. SystemC Implementation Observations • Successive iteration asymptotically approached area/performance of hand-code • Implementation time is about the same as for experienced verilog designer but no extensive hardware expertise required • Bus interface and Device drivers still required to interface with processor • Included with TIE implementation “for free”

  14. OpenSSL Integration Methodology • Wrote custom sha1 / md5 routines with Tensilica extensions and compiled to xtensa elf files • Created a wrapper for xtensa ISS to run the encryption routines • Statically linked the wrapper ISS into OpenSSL • When OpenSSL calls sha1 or md5, system traps down into emulated function that will in turn execute operation on wrapped simulator

  15. OpenSSL Integration Architecture

  16. OpenSSL Integration Challenges • Original approach was to statically link in the custom ISS using the OpenSSL “Engine” hardware accelerator interface • Openssl supports the dynamic loading of custom encryption engines and allows the user to choose which engine to use for a particular encryption routine • But the ISS uses dynamic libraries that cannot be statically linked in • So we kept the ISS as an executable and runs it as a separate process outside openssl, and returns results via external files • Openssl engine interface is not completely developed and does not fully support SSL functionalities • So instead of using the engine interface we replaced the OpenSSL original sha1 / md5 routines with our implementations that invoke the ISS

  17. Conclusions • Neither Tensilica nor SystemC implementations were fully automatic tools • However, they both led to implementations competitive with a hand implementation • Key advantage is that designs can be implemented with much less expertise • Especially much less hardware design expertise

More Related