260 likes | 385 Views
Design Exploration of 192-bit Elliptic Curve Adder on StarBridge HC-36 System. Gang Quan, Duncan A. Buell, James P. Davis, Siddaveerasharan Devarkal. Elliptic Curve Cryptography. Emerging as new generation of cryptosystems based on public key cryptography
E N D
Design Exploration of 192-bit Elliptic Curve Adder on StarBridge HC-36 System Gang Quan, Duncan A. Buell, James P. Davis, SiddaveerasharanDevarkal
Elliptic Curve Cryptography • Emerging as new generation of cryptosystems based on public key cryptography • No sub-exponential algorithm to solve the discrete logarithm problem • Smallest key size & highest strength per bit compared to other public key cryptosystems • Smaller key sizes suitable for hardware implementation P60
NIST standards • NIST has proposed a specific set of elliptic curves for cryptography purposes • Elliptic curves are defined for prime fields GF(p) and binary polynomial fields GF(2m) • Prime fields for 192, 224, 256, 384 and 521 bits • Binary fields for 163, 233, 283, 409 and 571 bits • Multi-precise arithmetic of such long bit-widths P60
Elliptic Curve Arithmetic • For 192-bit operand, naïve M * P operation involves 191 elliptic curve doublings and 96 elliptic curve additions (ECC Adder) • ECC Addition • Given P1=(x1, y1, z1), P2=(x2,y2,z2), compute P3=(x3,y3,z3) such that • 14 high bit-width modular multiplications • 42 high bit-width multi precision multiplications if using Montgomery multiplication method P60
StarBridge HC-36 System • 4 Processing Elements • Virtex II 6000 (Processing elements) • 66Mhz PCI Bus • PE-PE communication rates • 50 bits/cycle • Development Environment • Viva P60
Challenges Search for the optimal or near optimal design solution such that it can optimize the ECC Adder performance under the resource constraints (slices, number of built-in hardware multipliers, communication rate, etc ) of the target architecture (SBS HC36). The size of the design space can easily exceed 2120 evenwith a conservative estimation. P60
Rapid and accurate performance/cost evaluation is the key for effective and efficient design space exploration, and the performance/cost of the multipliers are critical for performance/cost of the ECC Adder. P60
Evaluation of Timing and Resource Usage of a Multiplier • Different Multiplier implementation • Shift-and-Add, Divide-and-Conquer(D&Q), “Broadcast” (BC), etc • Performance/cost trade off • Hybrid multiplier • A multiplier combining different implementation strategies P60
Divide & Conquer Multiplier • Karatsuba-Ofman Algorithm (1962) P60
“Broadcast” Multiplier • Algorithm • Features • Shuffling the partial product for fully pipelined implementation • Given k functional units, each “loop body” can be computed in parallel • Easy tradeoff of resource usage/speed by selecting k • k=N: Shift-and-add (low degree of parallelism, low speed, low resource usage) • k = 1, 2, 3, … (small integer) : Conventional “block” multiplications (high degree of parallelism, high speed, high resource usage) • Good scalability P60
The Hybrid Multiplier • A hybrid multiplier is denoted by a integer string, M(N) = {m1,m2,…,mn } • mi: the multiplier scheme at ith level • mi = 1, using D&Q scheme • mi = k (k>1), using BC scheme with k sub multipliers • for multiplication with bit width less than 18 bit, the build-in hardware multiplier (18x18) is used P60
An Example of Hybrid Multiplier • An 192 hybrid multiplier M(192)={ 1, 1, 3} • At the first level, D&Q scheme is adopted which requires three 96-bit multipliers • For each of the 96-bit multipliers (the 2nd level), the D&Q scheme is adopted again • For each of the 48-bit multipliers (the 3nd level), the BC scheme with three 16-bit multipliers is used • The hardware multipliers (18 bit) built in Virtex II 6000 are used for the 16-bit multiplications P60
The First Level of M(192)={1,1,3} D&Q is used which requires three 96-bit multipliers
The Second Level of M(192)={1,1,3} D&Q is used again which requires three 48-bit multipliers
The Third Level of M(192)={1,1,3} BC sheme with three 16-bit multipliers is used
Analytical Cost Estimation for the Hybrid Multiplier • Area Estimation • Si(N): area cost for N-bit multiplier • SOD&Q(N): area cost for the overhead in D&Q implementation of N-bit multiplier (for control and other units such as adders) • SOBC(N): area cost for the overhead in BC implementation of N-bit multiplier P60
It is reasonable to assume that SOD&Q(N) and SOBC(N) are linear to N. Therefore, SOD&Q(N) =ax N + const1 SOBC(N) = bx N + const2 Empirically, we have a = 15, b = 11, and const1 = const2 = 0. P60
Analytical Cost Estimation for the Hybrid Multiplier • Timing estimation • Ti(N): timing cost for N-bit multiplier • Tadd(N): timing cost for N-bit addition • TOD&Q(N): timing cost for the control in N-bit D&Q multiplier • TCBC(k): timing cost for the control with k base units in BC implementation • TBC(k): timing cost for “loop” overhead with k base units in BC implementation P60
With the given Viva design library, we have, for N < 192, k > 1, TOD&Q (N) = 3 TCBC (k) = 2 TBC(k) = k P60
Summary • Rapid estimation of the design cost for the hybrid multiplier architecture • With given Viva library, we are able to estimate the cycle number of a hybrid multiplier accurately • The relative error for the area estimation is within 5% • Future • Estimation of communication cost • Investigation of efficient hierarchical allocation/partition/mapping/scheduling techniques P60