1 / 31

Outline

A 1.5 GHz AWP Elliptic Curve Crypto Chip O. Hauck, S. A. Huss ICSLAB TU Darmstadt A. Katoch Philips Research. Outline. Current AWP projects GATS-Chip Elliptic Curve Chip AWPs compared to sync wave pipes SRCMOS circuits Crypto background Architecture and Implementation Conclusion.

kalli
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A 1.5 GHz AWPElliptic Curve Crypto ChipO. Hauck, S. A. HussICSLAB TU DarmstadtA. KatochPhilips Research

  2. Outline Current AWP projects GATS-Chip Elliptic Curve Chip AWPs compared to sync wave pipes SRCMOS circuits Crypto background Architecture and Implementation Conclusion

  3. Status of AWP Projects 2D-DCT: 0.6µm, being re-designed with self-resetting logic SRT: currently on schematics only 64b Giga-Hertz Adder Test Site: 0.6µm, almost complete, tape out in May Crypto chip: 0.35µm, tape out in July targeted

  4. Giga-Hertz Adder Test Site AMS 0.6µm 3M CMOS 64b Brent-Kung adder ~10k devices, ~1.3sqmm latency ~2.5ns cycle 1.0ns on-chip test circuitry

  5. General Framework for Pipelines Latch/Reg Latch/Reg Logic Data Clk

  6. Some Notations...

  7. General Relations

  8. Synchronous Wave Pipeline Latch/Reg Latch/Reg Wave Logic Data Clk Discrete, distinct valid frequency ranges Low high narrow frequency range not suitable for system design Promise: higher throughput at reduced latency, clock load, area and power Drawback: difficult tuning of logic and delay elements

  9. Synchronous Pipeline Latch/Reg Latch/Reg Logic Data Clk Throughput determined by longest logic path + clock/register overhead Fine-grain pipelining allows high throughput at the cost of increased clock/register overhead

  10. Asynchronous Wave Pipeline (AWP) Wave Latch Wave Latch Wave Logic Data req_in req_out matched delay More than one data and request propagating coherently One-sided cycle time constraint Delay must track logic over PTV corners

  11. Example: 64-b Brent-Kung Parallel Adder 0 1 2 3 4 pg PG PG G x o r Buffers provide for same depth on every logic path All gates in the same column must have the same delay

  12. Circuits • Logic style used has to minimize delay variation • Earlier work focused on bipolar logic (ECL, CML), but CMOS is mainstream • Static CMOS is not well suited for wave piping, fixing the problem results in more power and slower speed • Pass transistor logic gives slopy edges thereby introducing delay variation • Dynamic logic is attractive as only output high transition is data-dependant, output pulldown is done by precharge • What is needed is a dynamic logic family without precharge overhead: SRCMOS

  13. SRCMOS • Distinguishing property of our SRCMOS circuits: precharge feedback is fully local, and NMOS trees are delay balanced output N inputs

  14. Operation of a 2-AND

  15. CISCO Data Encryption Service Adapter [Cisco Systems]

  16. DES Key Exchange using Public-Key Cryptosystem based on Elliptic Curves

  17. Why is this secure ? • Security based upon DLP: in a finite Abelian group we can easily compute given • However, is hard to compute out of and • DLP extraordinarily hard for point group of elliptic curve: • Set of solutions of cubic equation over any field is an abelian group

  18. Elliptic Curve Mathematics and Algorithm • Two types - supersingular and non-supersingular • Non-supersingular have the highest security • EC equation:

  19. Adding Two Points Over Elliptic Curves

  20. Optimal Normal Basis

  21. Multiplication over ONBs

  22. The Final Formula

  23. Architecture of Multiplier • Pseudo NMOS • SRCMOS • 1 • 1 • 1 • 1 • 2 • 3 • abx • 2 • 3_Xor • Wave latch • abx • 1 • 3 • 1 • abx • 3_Xor • 3_Xor • Wave latch • 9 • 3_Xor • 27 • 3_Xor • 259 • 87 • 87 • 3_Xor • abx • 260 • 3_Xor • Wave latch • 29 • delay • abx • 261 • 781 • 782 • abx • 3_Xor • 783 • request • delay

  24. Dual-rail Circuits • Dual-rail cross-coupled SRCMOS circuit • NMOS trees are designed such that there is only one conducting path to ground

  25. Delay Variations at Various Stages

  26. Hierarchy of Control left shift 260 0 x k Double-and-Add Key generation rate R Hamming weight = 40 *(261*7+40*13) If x=1 always EC double EC add EC arithmetic R * 2347 MUL/s 7 13 * 261 Finite field arithmetic R * 612567 bit/s ADD MUL LOAD/ STORE 1 261 1

  27. Control Unit Architecture • For static operation • Request signals trigger the state transitions. • Autonomous state transitions are triggered by signal X • X • R • E • G • R • E • G • OUT • IN1 • Logic • reset • IN2 • req1 • Req_out • AWP • reqn

  28. High Level Control: Double-and-Add • Start/LoadX, ResetZ • 1 • X=1 • 2 • X=0 • LoadY • Shift K • 3 • X=0 • X=1 • 4 • If Stop=1/KP_Done • If K=0 • If K=1 • X=1 • 5 • ShiftK, Double • 6 • X=1 • K=0,DoubleDone • 8 • X=0 • K=1,DoubleDone/Add • 7 • AddDone • X=1 Level-based control

  29. Middle Level Control: EC Point Doubling • X=0 • X=1 • Pulse-based control • X=1 • X=1 • 0 • X=0 • 1 • X=1 • 2 • 3 • 4 • 5 • Start • OPAX • OPBZ • MULT • MD • X=1 • X=1 • X=1 • 58 • X=1 • 59 • X=0 • 60 • OPAA • X=1 • 61 • Shift • 62 • OPBA • 63 • MULT • MD

  30. Various States in a Pulsed Control

  31. Conclusion

More Related