Secure Cache: Run-Time Detection and Prevention of Buffer Overflow Attacks

Secure Cache: Run-Time Detection and Prevention of Buffer Overflow Attacks Koji Inoue Department of Informatics, Kyushu University Japan Science and Technology Agency

Trusted Program Malicious Program Branch Prediction Selective Activation Pipelining SuperScalar Signal Gating Clock Gating ILP TLP DVS OOO Exe. Value Prediction Resizing Drowsy Operation On-chip Cache MLP Background (1/2) Security •••••• High Performance Low Power/Energy

Did I Lock The Door? Did I Lock the Door? ? ? ? ? ? ? ? ? Very Tired You locked me! Go to your office! Let’s Work! I Don’t Care! Background (2/2) Don't you have such an experience? P

Architectural Support for Branch Prediction Selective Activation Pipelining SuperScalar Signal Gating Clock Gating ILP TLP DVS OOO Exe. Value Prediction Resizing Drowsy Operation On-chip Cache MLP The Goal of This Research Security SCache High Performance Low Power/Energy

Outlline • Introduction • Buffer-Overflow Attack • Secure Cache Architecture • Evaluation • Experimental Set-Up • Security and Energy Consumption • Tradeoff • Performance Overhead • Related Work • Conclusions

R.B.Lee, D.K.Karig, J.P.McGregor, and Z.Shi, “Enlisting Hardware Architecture to Thwart Malicious Code Injection,” Proc. of the Int. Conf. on Security in Pervasive Computing, Mar. 2003. Buffer-Overflow Attack Buffer Overflow • Well-Known vulnerability • Exploited by Blaster@2003 • Caused by unexpected operations • writing an inordinately large amount of data into a buffer • This vulnerability exists in the C standard library (e.g. strcpy) • Lead to a stack smashing • An attack code is inserted • The return address is corrupted • Used to highjack the program execution control

Function Call/Return int f ( ) { … g (s1); … } int g ( char *s1) { char buf [10]; … strcpy(buf, s1); … } Program code • Start f( ) • Call g( ) • Execute strcpy( ) • Return to f( )

Higher Addr. FP s1 Return Address The Next PC of Call g( ) Stack Growth Saved FP Local Variable buf SP Lower Addr. Function Call/Return int f ( ) { … g (s1); … } int g ( char *s1) { char buf [10]; … strcpy(buf, s1); … } Program code Program code • Start f( ) • Call g( ) • Execute strcpy( ) • Return to f( )

String Function Call/Return Higher Addr. int f ( ) { … g (s1); … } int g ( char *s1) { char buf [10]; … strcpy(buf, s1); … } FP s1 Return Address The Next PC of Call g( ) Program code Stack Growth Saved FP Local Variable buf SP Lower Addr. • Start f( ) • Call g( ) • Execute strcpy( ) • Return to f( )

Higher Addr. FP s1 Return Address The Next PC of Call g( ) Stack Growth Saved FP Local Variable buf String SP Lower Addr. Function Call/Return int f ( ) { … g (s1); … } int g ( char *s1) { char buf [10]; … strcpy(buf, s1); … } Program code • Start f( ) • Call g( ) • Execute strcpy( ) • Return to f( )

Higher Addr. Higher Addr. FP FP s1 s1 Return Address The Next PC of Call g( ) The Next PC of Call g( ) Return Address Stack Growth Stack Growth Saved FP Saved FP Local Variable buf Local Variable buf String SP SP Lower Addr. Lower Addr. Stack Smashing int f ( ) { … g (s1); … } int g ( char *s1) { char buf [10]; … strcpy(buf, s1); … } Program code • Start f( ) • Call g( ) • Execute strcpy( ) • Return to f( )

Higher Addr. FP s1 To the Attack Code The Next PC of Call g( ) Return Address Stack Growth Saved FP Attack Code Local Variable buf String SP Lower Addr. Insert the attack code! Corrupt the return address! Stack Smashing Higher Addr. int f ( ) { … g (s1); … } int g ( char *s1) { char buf [10]; … strcpy(buf, s1); … } FP s1 Return Address The Next PC of Call g( ) Program code Stack Growth Saved FP Local Variable buf SP Lower Addr. • Start f( ) • Call g( ) • Execute strcpy( ) • Return to f( )

Higher Addr. FP s1 To the Attack Code The Next PC of Call g( ) Return Address Stack Growth Saved FP Attack Code Local Variable buf String SP Lower Addr. Insert the attack code! Corrupt the return address! Hijack the program execution! Stack Smashing Higher Addr. int f ( ) { … g (s1); … } int g ( char *s1) { char buf [10]; … strcpy(buf, s1); … } FP s1 Return Address The Next PC of Call g( ) Program code Stack Growth Saved FP Local Variable buf SP Lower Addr. • Start f( ) • Call g( ) • Execute strcpy( ) • Return to f( )

Concept • Problem • The return address (RA) in the memory stack can be corrupted • Solution • RA is stored via On-Chip Caches! • Protect RA in the cache! • Implementation • Generate one or more “Replicas” on each RA store • Compare the original with a replica on the corresponding RA load • If they are not the same, we know that the poped RA has been corrupted!

Replica Flag (1b) MUX for replica lines Comparator（32b） Hit Condition Organization

Original Replica Operation: Return-Address Store #of Replica lines（Nrep）=2

Replica Original Operation: Return-Address Load #of Replica lines（Nrep）=2

Run-time detection of return-address corruption If at least a replica line exists Does not affect processor complexity Small impact on cache area and access time Controllable #of replica lines Tradeoff between energy and security Incomplete protection Replica lines may be evicted Degraded cache-hit rates Increase in the average memory access time Increase in the memory access energy Increased cache energy Generating replica lines Reading replica lines (compared to a low-power cache) Summary of SCache Pros Cons

SimpleScalar3.0 16KB 4-way D-cache Line size：32B OOO execution SPEC2000 7 integer programs 4 fp programs Small input 4KB SRAM design 0.18μm CMOS technology One way of the 16KB cache Hspice simulation w/ extracted load capacitances Measure the energy consumed for 1-bit accesses Experimental Set-Up Security/Energy/Performance Energy

SCache Models Security Evaluated SCaches Vulnerability = (Nv-rald / Nrald) * 100 Insecure issued RA load Total #of issued RA load Energy Consumption Etotal = Erd + Ewt + Ewb + Emp read write Replacement (on misses) Writeback to place replica lines *) Only load/store operations issued to the cache are considered

ALL: more than 99.3% of RA load MRU1: more than 98.5%of RA load Results (Vulnerability) 5.4% LRU1 LRU2 MRU1 MRU2 ALL

ALL: 23% of energy overhead MRU1: 9.9% of energy overhead Results (Energy Consumption) LRU1 LRU2 MRU1 MRU2 ALL

2.4% 2.3% EVP E2VP EV2P 164.gzip 176.gcc 188.ammp 164.gzip 176.gcc 188.ammp 164.gzip 176.gcc 188.ammp 175.vpr 197.parser 175.vpr 197.parser 175.vpr 197.parser MRU1: good for Energy-Oriented Applications MRU2: good for Security-Oriented Applications Results (EVP, E2VP, EV2P) Normalized to LRU1 LRU2 MRU1 MRU2 ALL

Outline • Introduction • Buffer-Overflow Attack • Secure Cache Architecture • Evaluation • Experimental Set-Up • Security and Energy Consumption • Tradeoff • Performance Overhead • Related Work • Conclusions

Static： SASI[WNSP99] StakcGuard[USENIX98] →Source Code Analysis →Re-Compilation Dynamic： SW: LibSafe/Verify[USENIX00] →Library Update →Performance Overhead SW: StackGhost[USENIX01] →Only for SPARC architecture HW: SRAS[SPC03] →Inside of the processor core →HW support for LIFO operations SCache: Dynamic+HW • Cache-Level Protection • De-coupled implementation • Random Access • Large Capacity • Reduced Cost Overhead Energy-Security Tradeoff Related Work

Conclusions • Summary • Architectural support for run-time buffer-overflow detection • Evaluation of Energy and Security • Security-Oriented：ALL (or MRU2) model • More than 99.3% of RA load can be protected (9/11 programs) • 23% of energy overhead • Energy-Oriented：MRU1 model • More than 98.5% of RA load can be protected (9/11 programs) • 9.9% of energy overhead • Future Work • Evaluate with vulnerable benchmarks • Consider a good measurement for security • Complete design of the SCache • Develop an optimization technique to adapt to user requirements for security and energy consumption

Back-Up Slides …

SCache Models Evaluated SCaches Security Vulnerability = (Nv-rald / Nrald) * 100 Insecure issued RA load Total #of issued RA load Energy Consumption Etotal = Erd + Ewt + Ewb + Emp read write Replacement (on misses) Writeback to place replica lines *) Only load/store operations issued to the cache are considered

ALL: more than 99.3% of RA load MRU1: more than 98.5%of RA load Results (Vulnerability) 6.1% 5.4% 31.1% 8.7% 4.7% LRU1L LRU1 LRU2 MRU1 MRU2 ALL

ALL: 23% of energy overhead MRU1: 9.9% of energy overhead Results (Energy Consumption) LRU1L LRU1 LRU2 MRU1 MRU2 ALL

Cache Miss Rates IRA: Issued Return Address CA: Cache Access

ALL: 1.1% of overhead MRU1: 0.1% of overhead Results (Performance) LRU1L LRU1 LRU2 MRU1 MRU2 ALL

Results (Energy Breakdown) CONV LRU1L LRU1 LRU2 MRU1 MRU2 ALL Emp Ewb Ewt Erd

WP Cache v.s SCache SCache + WP WP 1cycle Correct prediction + 1cycle Return-Address Load 2cycle Incorrect prediction

Secure Cache: Run-Time Detection and Prevention of Buffer Overflow Attacks