1 / 19

High Efficiency Counter Mode Security Architecture via Prediction and Pre-computation

High Efficiency Counter Mode Security Architecture via Prediction and Pre-computation. Weidong Shi Hsien-Hsin (Sean) Lee Mrinmoy Ghosh Chenghuai Lu Alexandra Boldyreva School of Electrical and Computer Engineering Georgia Institute of Technology. Counter. VAddr. Counter. Vaddr+2. Key.

makan
Download Presentation

High Efficiency Counter Mode Security Architecture via Prediction and Pre-computation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Efficiency Counter Mode Security Architecture via Prediction and Pre-computation Weidong Shi Hsien-Hsin (Sean) Lee Mrinmoy Ghosh Chenghuai Lu Alexandra Boldyreva School of Electrical and Computer Engineering Georgia Institute of Technology

  2. Counter VAddr Counter Vaddr+2 Key Key AES Block Cipher AES Block Cipher Counter+2 Counter+1 VAddr VAddr Counter+2 Counter+1 Vaddr+2 Vaddr+2 Encryption pad Encryption pad 16B Cache Line 16B Cache Line Counter+2 Counter+1 Counter Encrypted 16B Encrypted 16B Counter Mode Encryption • Each memory line has its own counter. • Each time memory line is updated, increment the counter.

  3. Counter+2 Counter+2 AES Block Cipher AES Block Cipher Decryption pad Decryption pad 16B Cache Line 16B Cache Line Encrypted 16B Encrypted 16B Counter Mode Decryption VAddr Vaddr+2 Key Key • Counter has to be fetched for memory line missing L2.

  4. Related Work • Use dedicated cache (sequence number cache) to reduce latency overhead of memory decryption (MICRO-36) • Prefetch based memory pre-decryption (WASSA 2004) • Prediction based memory decryption (this paper) • Fully exploit pre-computation capability enabled by counter mode encryption. • Use wasted idle crypto engine pipeline stages for prediction and pre-computation. • Less area overhead than caching and less memory pressure than prefetch based pre-decryption.

  5. frequently updated data infrequently updated data Counter Prediction • Counters exhibit both spatial and temporal coherence. • To exploit spatial coherence, memory blocks from the same page start counting from the same initial value (page root counter) counter static data

  6. Use Free Idle Pipeline Stages for Prediction Time Line AES Pipeline Memory Pipeline decrypted line • Unrolled and pipelined AES decryption logic often stays idle for tens to hundreds of cycles during L2 miss.

  7. Use Free Idle Pipeline Stages for Prediction • Use the idle pipeline stages to generate decryption pads based on • predicted counter values (a small window of look ahead counter • values based on page root counter number) E(K,G4) correct guess Time Line AES Pipeline Memory Pipeline decrypted line

  8. Page Root Counter Prediction History Vector Page Base (64 bits) (16bits) Addr ... 0xabcddcba12344321 0x0000ff00 ... ... ... ... ... ... TLB Counter Value Prediction Logic 0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0 1 Handle Frequent Updates Prediction Miss/Prediction Hit (miss =1, hit = 0) If total(miss)>threshold, reset the corresponding Page Root Counter to a new number

  9. Experimental Parameters • Simplescalar 3.0 • SPEC2000 INT/FP, benchmarks with high L2 misses. • Prediction hit rate study (8 billion instructions) • IPC performance (400 million on representative window)

  10. Prediction Rate • Prediction hit rate under 8 billion instructions • No counter number cache when using prediction • Prediction depth = 5 • Average prediction hit rate, about 82-83%

  11. IPC • IPC normalized with the scenario without decryption. • In general, outperform 128K counter cache • On average, in par with 512K counter cache

  12. Improve Prediction Accuracy • Two-level prediction • divide prediction depth into sub ranges • decide prediction range at first level • then make predictions in the range • Context based prediction • exploit temporal coherence of accessing memory locations with coherent update frequency

  13. 00 01 10 11 Prediction Window Prediction Window Prediction Window Prediction Window Two-level Prediction Counter Number In Natural Order • Divide prediction window into ranges (power of 2) • With 2bits per line, effectively quadruple the prediction depth. • Overhead is about 2KB on chip memory for 64-entry TLB.

  14. Prediction Window Context Based Prediction Counter Number In Natural Order • Store the previous line’s counter number depth value in a global register. • Generate new predictions based on Page Root Counter and the value in Context Register. • Can be combined with regular and 2-level predictions. Feed all the predictions into the decryption pipeline.

  15. Why Does It Work? { while (1) { for all lines of the page write to the line; for all lines of the page read the line; } } Memory Page (128 lines)

  16. Prediction Rates • 8 billion instruction window • Two-level prediction about 93% prediction hit • Context based + regular prediction almost 99% prediction hit

  17. IPC • IPC normalized to scenario of no decryption • 1-3% loss of performance using best prediction

  18. Conclusions • Counter value prediction allows pre-computing of pads speculatively without counter value caching. • Spatial and temporal coherence of memory update frequency enables effective counter value prediction. • Use idle cycles of pipelined decryption engine • Counter prediction achieves better performance than some of the large cache settings. • Complementary with caching technique

  19. Questions

More Related