1 / 43

A 3D Data Transformation Processor

Explore a comprehensive 3D data transformation processor for secure real-time trace collection and encryption, improving performance and data protection.

btreece
Download Presentation

A 3D Data Transformation Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A 3D Data Transformation Processor DimitriosMegas, KleberPizolato, Timothy Levin, and Ted Huffmire WESS 2012 October 11, 2012

  2. Disclaimer • The views presented in this talk are those of the speaker and do not necessarily reflect the views of the United States Department of Defense or the National Science Foundation.

  3. Split Manufacturing • Face-to-Back (F2B) Bonding

  4. Basic Idea • Combine using 3D integration: • Processor • Compression coprocessor • Cryptographic coprocessor

  5. Basic Idea • CPU Layer + Coprocessor Layer

  6. Basic Idea • Real-time trace collection • Compress trace prior to transmission to off-chip storage for offline program analysis • Optional encryption step can protect the compressed data from interception • High-performance stand-alone encryption service • XTRec: Secure Real-time Execution Trace Recording on Commodity Platforms (CMU) • Trusted computing: mitigate glitch attack against TPM (runtime hash of memory, capture sequence of instructions executed)

  7. Basic Idea • Real-time trace collection • The amount of data collected depends on the granularity of the collection and the speed of the system • Monitoring and collecting more signals results in a larger data stream

  8. Outline • Motivation and Background • Design Goals • Design Choices • System Architecture • Conclusions and Future Work

  9. Outline • Motivation and Background • Design Goals • Design Choices • System Architecture • Conclusions and Future Work

  10. Cryptographic Coprocessing • 3D vs. 2D

  11. Medical Image Processing • [Cong 2011]

  12. 3D-MAPS V1 vs V2 • Georgia Tech [Kim et al., ISSCC 2012] * Wide-I/O allows 512 bit/cycle DRAM access

  13. Stack Up Comparison • TSV usage • 3D-MAPS V1: For I/O (204 redundancy) • 3D-MAPS V2: For I/O (204 redundancy) and DRAM access (9 redundancy)

  14. What is 3Dsec? • Economics of High Assurance • High NRE Cost, Low Volume • Gap between DoD and Commercial • Disentangle security from the COTS • Use a separate chip for security • Use 3-D Integration to combine: • Control Plane • Computation Plane • Need to add posts to the COTS chip design • Dual use of computation plane

  15. Pro’s and Con’s • Why not use a co-processor? On-chip? • Pro’s • High bandwidth and low latency • Controlled lineage • Direct access to internal structures • Con’s • Thermal and cooling • Design and testing • Manufacturing yield

  16. Cost • Cost of fabricating systems with 3-D • Fabricating and testing the security layer • Bonding it to the host layer • Fabricating the vias • Testing the joined unit

  17. Circuit-Level Modifications • Passive vs. Active Monitoring • Tapping • Re-routing • Overriding • Disabling

  18. 3-D Application Classes • Enhancement of native functions • Secure alternate service • Isolation and protection • Passive monitoring • Information flow tracking • Runtime correctness checks • Runtime security auditing

  19. Outline • Motivation and Background • Design Goals • Design Choices • System Architecture • Conclusions and Future Work

  20. Design Goals • High Performance • Ability to gather and compress architectural state of a processor at runtime

  21. Outline • Motivation and Background • Design Goals • Design Choices • System Architecture • Conclusions and Future Work

  22. Design Choices • Manufacturing process • Face-to-face (F2F) • Compression algorithm/hw • Two stages: filtering + general-purpose • Crypto algorithm/hw • AES-128, SHA-1, SHA-512 • Interface between planes • 128 F2F vias up, 32 down (direct connection)

  23. Design Choices • Other Issues • Coordination between planes • Control words in special registers • Interface within control plane • Output of compression  input of crypto • Delivery of I/O and power • Use existing capability of computation plane • Computation plane hardware • High-performance general-purpose processor • Clock synchronization • Tree network

  24. Compression Study • Use TCgen to compress a set of trace files generated using Pin • Traces capture memory access behavior of various Linux applications • Vary parameters of TCgen for each field • TCgen is prediction-based compression • Which algorithm is most effective? • Apply general-purpose compression in second stage (gzip)

  25. Trace Files (generated by Pin) • Instruction • Count PC ADDRESS Size • 8 0x52d70b 0x5913c000 4 • 25 0x543cc6 0xbff10254 4 • 25 0x543cc7 0xbff10258 4 • 33 0x52d6bb 0xbff1025c 4 • 33 0x52d6be 0xbff10260 4 • 33 0x52d6c2 0xbff10264 4 • 33 0x52d6c8 0xbff10268 4 • 33 0x52d6c9 0xbff1026c 4 • 37 0x9bcb44 0xa1a50800 4 • 40 0x6eb126 0xbff10268 4

  26. PC Field • Number of correct predictions (%) for each configuration of TCgen when compressing the PC field (average of all 5 trace files)

  27. Data Address Field • Number of correct predictions (%) for each configuration of TCgen when compressing address field (average of all 5 trace files)

  28. PC Field • Compression ratio for the PC field

  29. Data Address Field • Compression ratio for the data address field

  30. Outline • Motivation and Background • Design Goals • Design Choices • System Architecture • Conclusions and Future Work

  31. Computation Plane • CPU

  32. Control Plane • Compression coprocessor (DFCM + gzip)

  33. Control Plane • gzip unit (within compression coprocessor)

  34. Control Plane • AES/SHA

  35. Control Plane • Microprocessor interface unit

  36. Full 3D System • 3D IC

  37. Outline • Motivation and Background • Design Goals • Design Choices • System Architecture • Conclusions and Future Work

  38. Conclusions • Applications: trusted computing, reverse engineering of malicious software, post-mortem analysis of system that has suffered an attack • Simple preprocessing can decrease bandwidth (also gives power advantages) • There is much to do before making silicon. It is useful to quantify the high-level tradeoffs: • Data to compress • Sampling rate • Number of TSVs • Throughput

  39. Future Work • Independent I/O and power delivery • How to share the I/O of computation plane? • Floor Planning • How much logic/memory can you fit between the TSVs? • It would be helpful for the 3D chip to be pin-compatible with the 2D package. • Use a network/share the TSVs? • Joining dissimilar technology nodes • Use buffers, redundant hardware

  40. Future Work • More types of trace files • General-purpose interface, migration path • Can you test/verify computation plane without knowing what the control plane will be? • Characteristics of a “typical” trace file? • Hierarchy of compression, for power not just for compression ratio? • Lossy compression?! • Trust issues • Who generates the write signal? • How to protect the key? • Can monitored software turn off monitoring? • Hardware implementation • Simulation • FPGA prototype • Tape-out

  41. Split Manufacturing • Discussion Points • Can we trust the result of split manufacturing? • Could this approach harm security? • Is it worth it? When is it worth it? • Why not use trusted foundry always? • Are trusted foundries a band aid solution to offshoring trend? • How to trust trusted foundry? • Why not use redundancy with majority vote? • Can we do everything from scratch?

  42. Split Manufacturing • Discussion Points • How to raise alarm if network interface is controlled by adversary? • Use challenge-response protocols? • Security architecture • Packaging considerations • Distributed posts, policy state? • If computation plane can perform AES, why perform AES in control plane?

  43. Questions? • faculty.nps.edu/tdhuffmi

More Related