1 / 28

Superscalar Coprocessor for High-speed Curve-based Cryptography

This study explores high-speed curve-based cryptography through a superscalar coprocessor, analyzing instruction-level parallelism and performance across different cryptocurrencies. The architecture is designed to support programmability and scalability of curve-based cryptography, featuring a modular arithmetic logic unit (MALU) and various HW/SW partitioning types. Results showcase the benefits of the proposed hierarchy and coprocessor implementations for enhanced cryptography processing speed.

csilver
Download Presentation

Superscalar Coprocessor for High-speed Curve-based Cryptography

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Superscalar Coprocessor forHigh-speed Curve-based Cryptography K. Sakiyama, L. Batina, B. Preneel, I. Verbauwhede Katholieke Universiteit Leuven / IBBT Department Electrical Engineering - ESAT/COSIC 1/26

  2. Introduction Curve-based Cryptography HW/SW Partitioning Superscalar Coprocessor Results Conclusions Overview 2/26

  3. IntroductionMotivation • High-speed curve-based cryptography in HW/SW co-design • How much instruction-level parallelism can we obtain from coprocessor instructions? • Performance improvement for different operation forms in datapath • AB+C mod P vs A(B+D)+C mod P ,A,B,C,D,P: polynomials • Performance comparison three different curve-based cryptosystems • Which one is faster between ECC, HECC, ECC over a composite field? • Programmability and scalability • Programmable in order to support different cryptosystems? • Scalable in field sizes? 3/26

  4. IntroductionTarget Architecture • Curve-based cryptography over binary fields • Hardware can be smaller and faster than prime field • ECC over a binary field, e.g. GF(2163) • HECC of genus 2 Field length can be shorter with a factor of 2, e.g. GF(283) • ECC over a composite field Field length can be shorter with a factor of 2, e.g. GF ((283)2) • The datapath can be shared • Programmable coprocessor supporting three curve-based cryptography by defining coprocessor instruction(s) • (Coprocessor) instruction-level parallelism by superscalar 4/26

  5. Introduction Curve-based Cryptography HW/SW Partitioning Superscalar Coprocessor Results Conclusions Overview 5/26

  6. Curve-based CryptographyHW/SW partitioning (1) • General hierarchy in coprocessor for curve-based cryptography Point/Divisor Multiplication SW or HW controller Point/Divisor Addition Point/Divisor Doubling SW or HW controller Finite Field Addition Finite Field Multiplication Finite Field Inversion HW Datapath 6/26

  7. Curve-based Cryptography Proposed Hierarchy (1) • Single instruction for all finite field operations • Fixed-cycle execution enables efficient implementation Single Instruction (Datapath) Point/Divisor Multiplication Point/Divisor Multiplication Conventional Point/Divisor Addition Point/Divisor Doubling Finite Field Inversion Point/Divisor Addition Point/Divisor Doubling Finite Field Operation E.g. AB+C mod P Finite Field Addition Finite Field Multiplication Finite Field Inversion 7/26

  8. Curve-based Cryptography Modular Arithmetic Logic Unit (MALU) • (a) Building block: Regular XOR chains • (b) Scalable in digit size (d) and field size (k) by interconnecting several building blocks • We use MALU83 (n=83, d=12) as building block • 2xMALU83 can be configured as 1xMALU163 8/26

  9. Introduction Curve-based Cryptography HW/SW Partitioning Superscalar Coprocessor Results Conclusions Overview 9/26

  10. HW/SW PartitioningTYPE I: Smallest implementation (baseline) Main CPU SRAM Program ROM Memory Mapped I/O 32-bit instructions 32-bit data Coprocessor DBC IBC Data Bus Instruction Bus MALU83 10/26

  11. HW/SW PartitioningTYPE II: TYPE I + m-code RAM Main CPU SRAM Program ROM Memory Mapped I/O 32-bit instructions 32-bit data Coprocessor IBC FSM m-code RAM DBC Data Bus Instruction Bus MALU83 11/26

  12. HW/SW PartitioningTYPE III: TYPE I + Coprocessor Memory Main CPU SRAM Program ROM Memory Mapped I/O 32-bit instructions 32-bit data Coprocessor DBC IBC Data Bus Instruction Bus MALU83 Coprocessor Memory 12/26

  13. HW/SW PartitioningTYPE IV: TYPE I + Copro. Mem.& m-code RAM Main CPU SRAM Program ROM Memory Mapped I/O 32-bit instructions 32-bit data Coprocessor IBC FSM m-code RAM DBC Data Bus Instruction Bus MALU83 Coprocessor Memory 13/26

  14. HW/SW PartitioningCo-design flow with GEZEL C/C++ codes for PKCs Partitioning of functions C/C++ codes & H/W behavior blocks w/interface ARM (SW) Co-processor (HW) C/C++ codes w/physical memory map Cycle-true sim. (GEZEL) GEZEL FDL codes Cross compile Synthesis Program codes VHDL codes 14/26

  15. HW/SW PartitioningResult: Vertical Exploration of System • HECC Performance for different HW/SW partitioning (Performance: Point/Divisor multiplication) 15/26

  16. Introduction Curve-based Cryptography HW/SW Partitioning Superscalar Coprocessor Results Conclusions Overview 16/26

  17. Superscalar Coprocessor Proposed Hierarchy (2) • Multiple Modular Arithmetic Logic Units (MALUs) in coprocessor Single MALU Point/Divisor Multiplication Multiple MALUs Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Inversion Point/Divisor Addition Point/Divisor Doubling Finite Field Inversion Finite Field Operation E.g. AB+C mod P Finite Field Operation E.g. AB+C mod P Finite Field Operation E.g. AB+C mod P Finite Field Operation E.g. AB+C mod P Finite Field Operation E.g. AB+C mod P … 17/26

  18. Main CPU SRAM Program ROM Memory Mapped I/O 32-bit instructions Buffer Full 32-bit data Coprocessor IBC FSM m-code RAM DBC IQB Data Bus Instruction Bus MALU83 MALU83 MALU83 MALU83 Coprocessor Memory Superscalar Coprocessor Parallel Processing Architecture (TYPE IV-based) 18/26

  19. Superscalar Coprocessor Horizontal Exploration of System • Performance of ECC and HECC 19/26

  20. Introduction Curve-based Cryptography HW/SW Partitioning Superscalar Coprocessor Results Conclusions Overview 20/26

  21. ResultsPerformance for ECC over GF(283) • Fastest of three • x1.8 speed-up by 2-way superscaling (ILPDP=6) with A(B+D)+C • Still more improvement is possible by adding MALUs AB+C A(B+D)+C 21/26

  22. ResultsPerformance of HECC over GF(283) • Faster than ECC over a composite field • x2.7 speed-up by 4-way superscaling (ILPDP=5) with A(B+D)+C • Less improvement as increasing # of MALU AB+C A(B+D)+C 22/26

  23. ResultsPerformance for ECC over GF((283)2 ) • Slowest of three • x2.5 speed-up by 4-way superscaling (ILPDP=6) with A(B+D)+C • Less improvement as increasing # of MALU AB+C A(B+D)+C 23/26

  24. ResultsComparison of ECC/HECC implementations on FPGAs [11] T. Wollinger, PhD thesis, 2004. [13] G. Orlando and C. Paar, CHES 00. [14] N. Gura et al., CHES02. [29] Nazar A. Saqib et al., International Journal of Embedded Systems 2005 24/26

  25. Conclusions • Performance improvement / Comparison • ECC was improved by a factor of 1.8 (2-way) • HECC (genus 2) was improved by a factor of 2.7 (4-way) • ECC over a composite field was improved by a factor of 2.5 (4-way) • A(B+D)+C offers better performance than AB+C • ECC is the fastest in this case study • Programmability & flexibility • Support three different curve-based cryptosystems over a binary field • Arbitrary irreducible polynomial • Field size up to 332 bitsby using 4xMALU83 25/26

  26. Thank you! 26/26

  27. Parallel issue of instructionsCase of using 4 MALUs • IF/D : Instruction Fetch & Decode • R_ : Read operands (dependent on the type of operation) • EX : Execution (dependent on MALU configuration, k & d) • W_ : Write (dependent on # of instructions issued in parallel) 27/26

  28. Parallel issue of instructionsOut-of-order Execution • Check RAW (Read After Write Dependency) for in-/out-of-order execution 28/26

More Related