170 likes | 280 Views
CS343 Project Presentation: Secure Hash Acceleration. Winnie Cheng Alvin Cheung Paul Hartke June 4, 2003. Project Overview. Accelerate secure hash cryptography algorithms in common use versus a standalone processor Focus on Md5 and Sha1 Utilize two different implementation methodologies
E N D
CS343 Project Presentation:Secure Hash Acceleration Winnie Cheng Alvin Cheung Paul Hartke June 4, 2003
Project Overview • Accelerate secure hash cryptography algorithms in common use versus a standalone processor • Focus on Md5 and Sha1 • Utilize two different implementation methodologies • Tensilica Xtensa Processor and SystemC • Integrate implementations in real application • Open-Source OpenSSL package selected as target application • Utilizes a number of encryption algorithms • Integrate system in operational system versus using synthetic benchmarks
SHA-1 Basic Round repeated 80 times
Tensilica Processor Extensions • Create compound instructions to perform more of the algorithm per clock cycle • 25 instructions/byte of input data @ 200Mhz clock 64Mbps • Reduce to 5 instructions per cycle • 5 instructions/byte of input data @ 200Mhz clock 320Mbps • 5 cycles from the critical path of the operations using a 200Mhz clock
Custom Instruction Sharing • Sharing between instructions appears attractive • Both algorithm rounds dominated by adder trees, shifts, and logical functions • However, the overlap of actual specific groups of operations was minimal • Results in separate instructions for each algorithm
Architectural Exploration with SystemC • Objective is to take the same source md5/sha1 high level C source code and directly generate a hardware implementation • Then compare to existing hand verilog implementations and extended TIE processor
SystemC Limitations • Original source code not directly usable by SystemC • Pointers not synthesizable requires rewrite of original source • Minimal architectural transformations performed • no loop fusion • no automatic loop unrolling exploration
Successive Design Iterations • Iterative flow results required successive source code transformations to achieve better size and area • Scheduling analysis indicated target areas for improvement • Areas of low utilization • Excessive resource dependencies • In the end, final source code gave results close to hand verilog implementation • But final code had very little resemblance with original C source but did resemble hand verilog
SystemC Implementation Observations • Successive iteration asymptotically approached area/performance of hand-code • Implementation time is about the same as for experienced verilog designer but no extensive hardware expertise required • Bus interface and Device drivers still required to interface with processor • Included with TIE implementation “for free”
OpenSSL Integration Methodology • Wrote custom sha1 / md5 routines with Tensilica extensions and compiled to xtensa elf files • Created a wrapper for xtensa ISS to run the encryption routines • Statically linked the wrapper ISS into OpenSSL • When OpenSSL calls sha1 or md5, system traps down into emulated function that will in turn execute operation on wrapped simulator
OpenSSL Integration Challenges • Original approach was to statically link in the custom ISS using the OpenSSL “Engine” hardware accelerator interface • Openssl supports the dynamic loading of custom encryption engines and allows the user to choose which engine to use for a particular encryption routine • But the ISS uses dynamic libraries that cannot be statically linked in • So we kept the ISS as an executable and runs it as a separate process outside openssl, and returns results via external files • Openssl engine interface is not completely developed and does not fully support SSL functionalities • So instead of using the engine interface we replaced the OpenSSL original sha1 / md5 routines with our implementations that invoke the ISS
Conclusions • Neither Tensilica nor SystemC implementations were fully automatic tools • However, they both led to implementations competitive with a hand implementation • Key advantage is that designs can be implemented with much less expertise • Especially much less hardware design expertise