140 likes | 315 Views
An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking. Hatim Diab 1 , Miaoqing Huang 1 , Kris Gaj 2 , Tarek El-Ghazawi 1 , Nikitas Alexandridis 1 1 The George Washington University 2 George Masson University. Objectives.
E N D
An automated pipeline balancingin the SRC Reconfigurable Computerand its application to the RC5 cipher breaking Hatim Diab1, Miaoqing Huang1, Kris Gaj2, Tarek El-Ghazawi1 , Nikitas Alexandridis1 1The George Washington University 2George Masson University
Objectives • Implement pipelined RC5 Key Breaker on a single chip, • Demonstrate automatic balancing of a pipeline by a compiler (SRC), • Show the cost of added pipeline. 1011/MAPLD'04
Requirements • Given: • A matching pair of Plain text message (M) and Cipher text (C) • Find the correct corresponding Secret Key • Test the possible Secrete Keys exhaustively, • Keys, 128bit-long key from all 0’s to all 1’s. • Requirements • The processing element (PE) to be fed a new Secrete Key (Ki) each cycle, • Compare C with the output Ci corresponding to Ki 1011/MAPLD'04
RC5 Algorithm • Mixing in the Secret Key. i=j=0 A=B=0 do 3*max(26,4) times // S[0..25] is the array to be mixed for rc5 encryption A=S[i]=(S[i]+A+B)<<<3; // L[0…3] is the array converted from the secrete key K[0..15] B=L[j]=(L[j]+A+B)<<<(A+B); i=(i+1) mod (26); // The output is the array S[0..25], which will be used to encrypt j=(j+1) mod (4); // the plain text. • Encryption. LE=A+S[0]; // A is the upper part of plain text RE=B+S[1]; // B is the low part of plain text for i=1 to 12 do LE=((LE⊕RE)<<<RE)+S[2*i]; RE=((RE⊕LE)<<<LE)+S[2*i+1]; The processed LE is the upper part of cipher text, The processed RE is the low part of cipher text. 1011/MAPLD'04
Key-Breaking Flowchart 1011/MAPLD'04
Condition & Implementation • RC5 32/12/16 • Cipher text 32*2 bits = 64 bits • 12 rounds • Key = 16 * 8bits = 128 bits • Implement RC5 encryption using • 12 rounds of encryption macros, with 6 clocks latency • 78 iterations of key generation macros, with 3 clocks latency 1011/MAPLD'04
Design & Bottleneck • Pipelined design • Process one key every clock cycle in a pipelined fashion • Data dependencies • One of the features of RC5 is the extensive use of data dependent rotations, • S value needed every 26th step, • L value needed every 4th step, • Manual HDL-based realization of the pipeline proved to be time-consuming and error-prone. 1011/MAPLD'04
Data Dependencies in Each Iteration 1011/MAPLD'04
Solution • Implement on one FPGA chip concurrently • 78 key initialization macros • 12 encryption macros • Connect the macros in a linear pipeline. • The SRC compiler will balance the pipeline by inserting delay channels to make all macros run synchronously. 1011/MAPLD'04
Delay 1 = 1 reg Delay 2 = 2 reg wire Delay 5 = 5 reg Delay Channels Added by SRC Compiler 1011/MAPLD'04
Detailed flow 1011/MAPLD'04
Compilation Result • Device utilization summary: Number of External IOBs 594 out of 1104 53% Number of LOCed External IOBs 594 out of 594 100% Number of Slices 33790 out of 33792 99% Number of BUFGMUXs 1 out of 16 6% • Maximum Clock Frequency 1011/MAPLD'04
Effectiveness of the Benchmark 1011/MAPLD'04
Conclusion • The objective was realized, i.e., every clock one 128bit-long variable is pushed into the processing chain, • A speed-up of 1000x over SW and 300x over serial HW implementations was achieved, • For the flexible parameters used in RC5 algorithm, different map routines can be designed respectively to fit the distinct area and throughput requirements, • The automated pipeline balancing of the SRC compiler proved to substantially decrease the development time of complex pipelined designs. 1011/MAPLD'04