310 likes | 450 Views
A 1.5 GHz AWP Elliptic Curve Crypto Chip O. Hauck, S. A. Huss ICSLAB TU Darmstadt A. Katoch Philips Research. Outline. Current AWP projects GATS-Chip Elliptic Curve Chip AWPs compared to sync wave pipes SRCMOS circuits Crypto background Architecture and Implementation Conclusion.
E N D
A 1.5 GHz AWPElliptic Curve Crypto ChipO. Hauck, S. A. HussICSLAB TU DarmstadtA. KatochPhilips Research
Outline Current AWP projects GATS-Chip Elliptic Curve Chip AWPs compared to sync wave pipes SRCMOS circuits Crypto background Architecture and Implementation Conclusion
Status of AWP Projects 2D-DCT: 0.6µm, being re-designed with self-resetting logic SRT: currently on schematics only 64b Giga-Hertz Adder Test Site: 0.6µm, almost complete, tape out in May Crypto chip: 0.35µm, tape out in July targeted
Giga-Hertz Adder Test Site AMS 0.6µm 3M CMOS 64b Brent-Kung adder ~10k devices, ~1.3sqmm latency ~2.5ns cycle 1.0ns on-chip test circuitry
General Framework for Pipelines Latch/Reg Latch/Reg Logic Data Clk
Synchronous Wave Pipeline Latch/Reg Latch/Reg Wave Logic Data Clk Discrete, distinct valid frequency ranges Low high narrow frequency range not suitable for system design Promise: higher throughput at reduced latency, clock load, area and power Drawback: difficult tuning of logic and delay elements
Synchronous Pipeline Latch/Reg Latch/Reg Logic Data Clk Throughput determined by longest logic path + clock/register overhead Fine-grain pipelining allows high throughput at the cost of increased clock/register overhead
Asynchronous Wave Pipeline (AWP) Wave Latch Wave Latch Wave Logic Data req_in req_out matched delay More than one data and request propagating coherently One-sided cycle time constraint Delay must track logic over PTV corners
Example: 64-b Brent-Kung Parallel Adder 0 1 2 3 4 pg PG PG G x o r Buffers provide for same depth on every logic path All gates in the same column must have the same delay
Circuits • Logic style used has to minimize delay variation • Earlier work focused on bipolar logic (ECL, CML), but CMOS is mainstream • Static CMOS is not well suited for wave piping, fixing the problem results in more power and slower speed • Pass transistor logic gives slopy edges thereby introducing delay variation • Dynamic logic is attractive as only output high transition is data-dependant, output pulldown is done by precharge • What is needed is a dynamic logic family without precharge overhead: SRCMOS
SRCMOS • Distinguishing property of our SRCMOS circuits: precharge feedback is fully local, and NMOS trees are delay balanced output N inputs
CISCO Data Encryption Service Adapter [Cisco Systems]
DES Key Exchange using Public-Key Cryptosystem based on Elliptic Curves
Why is this secure ? • Security based upon DLP: in a finite Abelian group we can easily compute given • However, is hard to compute out of and • DLP extraordinarily hard for point group of elliptic curve: • Set of solutions of cubic equation over any field is an abelian group
Elliptic Curve Mathematics and Algorithm • Two types - supersingular and non-supersingular • Non-supersingular have the highest security • EC equation:
Architecture of Multiplier • Pseudo NMOS • SRCMOS • 1 • 1 • 1 • 1 • 2 • 3 • abx • 2 • 3_Xor • Wave latch • abx • 1 • 3 • 1 • abx • 3_Xor • 3_Xor • Wave latch • 9 • 3_Xor • 27 • 3_Xor • 259 • 87 • 87 • 3_Xor • abx • 260 • 3_Xor • Wave latch • 29 • delay • abx • 261 • 781 • 782 • abx • 3_Xor • 783 • request • delay
Dual-rail Circuits • Dual-rail cross-coupled SRCMOS circuit • NMOS trees are designed such that there is only one conducting path to ground
Hierarchy of Control left shift 260 0 x k Double-and-Add Key generation rate R Hamming weight = 40 *(261*7+40*13) If x=1 always EC double EC add EC arithmetic R * 2347 MUL/s 7 13 * 261 Finite field arithmetic R * 612567 bit/s ADD MUL LOAD/ STORE 1 261 1
Control Unit Architecture • For static operation • Request signals trigger the state transitions. • Autonomous state transitions are triggered by signal X • X • R • E • G • R • E • G • OUT • IN1 • Logic • reset • IN2 • req1 • Req_out • AWP • reqn
High Level Control: Double-and-Add • Start/LoadX, ResetZ • 1 • X=1 • 2 • X=0 • LoadY • Shift K • 3 • X=0 • X=1 • 4 • If Stop=1/KP_Done • If K=0 • If K=1 • X=1 • 5 • ShiftK, Double • 6 • X=1 • K=0,DoubleDone • 8 • X=0 • K=1,DoubleDone/Add • 7 • AddDone • X=1 Level-based control
Middle Level Control: EC Point Doubling • X=0 • X=1 • Pulse-based control • X=1 • X=1 • 0 • X=0 • 1 • X=1 • 2 • 3 • 4 • 5 • Start • OPAX • OPBZ • MULT • MD • X=1 • X=1 • X=1 • 58 • X=1 • 59 • X=0 • 60 • OPAA • X=1 • 61 • Shift • 62 • OPBA • 63 • MULT • MD