Jingfei Kong* University of Central Florida Onur Acıiçmez Samsung Electronics

Hardware-Software Integrated Approaches to Defend Against Software Cache-based Side Channel Attacks Jingfei Kong*University of Central Florida Onur Acıiçmez Samsung Electronics Jean-Pierre Seifert TU Berlin & Deutsche Telekom Laboratories Huiyang Zhou University of Central Florida

Why Should We Care about Side Channel Attacks? • Cryptographic applications are the very important software component in modern computers( e.g. secure online transactions) • Cryptographic algorithms are designed to impose unreasonable time and resource cost on a successful attack • To break a 128-bit symmetric key in brute-force: 2128 possibilities, a device that can check 260/second still requires around 9.4*240 years, about 700 times the age of the universe. • By exploiting certain features of modern microprocessors, it may just take few hours to get the secret key! University of Central Florida

What are Software Cache-based Side Channel Attacks? • Side channel attacks • exploit any observable information generated as a byproduct of the cryptosystem implementation, e.g. power trace, electromagnetic radiation • infer the secret information, e.g. the secret key • Software cache-based side channel attacks • exploit latency difference between cache access and memory access • the source of information leakage: cache misses of critical data whose addresses are dependent on the secret information • mainly access-driven attacks and time-driven attacks University of Central Florida

An Example: Advanced Encryption Standard (AES) • one of the most popular algorithms in symmetric key cryptography • 16-byte input (plaintext) • 16-byte output (ciphertext) • 16-byte secret key (for standard 128-bit encryption) • several identical rounds of 16 XOR operations and 16 table lookups in a performance-efficient software implementation secret key byte Lookup Table index byte input/output byte University of Central Florida

Access-driven Attacks Cache a b c d Main Memory spy process’s data victim process’s data b>(a≈c≈d) University of Central Florida

Time-driven Attacks cache hit/miss computation cache hit/miss computation Total execution time is affected by cache misses indices of table lookups secret key byte input/output byte University of Central Florida

Cache-collision Time-driven Attacks on AES cache hit/miss i computation cache hit/miss j computation Xi Ki Xj Kj cache access j is a cache miss assuming no same cache access before Case 1: Xj Kj ≠ Xi Ki cache access j is a cache hit assuming no conflict miss in between Case 2: Xj Kj = Xi Ki Xj Kj = Xi Ki => Ki Kj = Xi Xj Statistically speaking, Case 1 takes longer execution time than Case 2. Only when Ki Kj = Xi Xj, AES encryption exhibits the shortest execution time University of Central Florida

The Foundation of Cache-Collision Attacks the number of collisions in the final round of AES one Pentium 4 processor A higher number of collisions, a smaller number of cache misses, thus a shorter encryption time University of Central Florida

Current Proposed Software/Hardware Countermeasures • Software proposals • easy to deploy with no hardware changes • application specific • substantial performance overhead • data layout and code have to be changed • no security guarantee • Hardware proposals • generic (not application specific) • performance efficient • still with some security issues • hardware changes • not flexible University of Central Florida

Hardware-Software Integrated Approaches • Hardware tackles the source of information leakage: cache misses over critical data • Software offers the flexibility, even against future attacks • Three approaches for enhancing the security of various cache designs with tradeoffs between hardware complexity and performance overhead • preloading to protect PLcache (from ISCA’07) • securing RPcache (from ISCA’07) with informing loads • securing regular caches with informing loads University of Central Florida

Informing Loads Approach: Informing Memory Operations • Informing load instructions • work as normal load instructions upon cache hits • generate an user-level exception upon cache misses • originally proposed as a lightweight support for memory optimization (ISCA’96) • Leverage the same information exploited by attacks • Use informing load instructions to read data from lookup tables • The flexible countermeasures are provided by software implementation in the exception handler University of Central Florida

Defend against access-driven attacks • Even the very first cache miss is security-critical to access-driven attacks • software random permutation in AES implementation • randomize the mapping between table indices and cache lines • obfuscate attackers’ observation • Fixed software random permutation is vulnerable • detect the event of cache misses using informing loads and perform permutation update in the exception handler • every time there is a chance (cache miss) to leak the information, the permutation is changed randomly • balance the tradeoff between security and performance • Overall, a software random permutation scheme with permutation update only when necessary (cache misses) University of Central Florida

Defend against time-driven attacks • The correlation between the secret key and number of cache misses • detect the event of cache misses using informing loads • load all the critical data into cache in the exception handler • avoid cache misses for subsequent cache access • break the correlation University of Central Florida

The Defense Procedure 0. AES implementation uses the software random permutation version instead of the original one 1. Informing load instructions are used to load those critical data 2. The cache miss over critical data is detected by informing load instructions. The program execution is redirected to the user-level exception handler. 3. Inside the exception handler, all critical data are preloaded into cache. Also permutation update is performed between the missing cache line and a randomly-chosen cache line. Cache Main Memory other process’s data AES’ data University of Central Florida

The Implementation of Software Random Permutation in AES original lookup table converted lookup tables University of Central Florida

Countermeasure Implementation in the Exception Handler • preload all table data to defend against time-driven attacks through prefetching from address pointers T’[0], T’[1], …, T’[K-1] • permutation update to defend against access-driven attacks by swapping both the pointers and the data University of Central Florida

Experiments • Experimental Setup • Default processor configuration in a MIPS-like SimpleScalar simulator • pipeline bandwidth:4, 128 ROB, 64 IQ, 64 LSQ • 32KB 2-way I/D L1, 2MB L2, cache block size 64B • fetch policy for SMT: round-robin • AES software implementation (baseline): OpenSSL 0.9.7c implementation • AES performance microbenchmark: OpenSSL speed test program • Security Evaluation • impact of ILP and MLP on cache collision time-driven attacks • security analysis on our regular caches with informing loads approach • Performance Evaluation • performance impact on AES • performance impact on an SMT processor University of Central Florida

Impact of ILP and MLP on Cache-collision Time-driven Attacks the number of cache collisions in the final round of AES the more ILP and MLP, the less observable trend the less correlation between the number of cache collisions and the execution time the less correlation between the key and execution time, the more number of samples required for a successful attack University of Central Florida

Security Evaluation on Regular Caches with Informing Loads • Mitigation against access-driven attacks (see the theoretical proof from Wang et al. at ISCA’07) • Mitigation against cache collision time-driven attacks the number of cache collisions in the final round of AES University of Central Florida

Performance Impact on AES performance improves as cache conflict misses between lookup table data and other data are almost gone because of larger caches/more associativties most of the overhead is because of the indirection table introduced for software randomization performance takes a hit because of cache conflict misses between the lookup table data and other data, which causes lots of exception handling University of Central Florida

Performance Impact on a 2-way SMT Processor • With larger caches / more associativities, the performance overheads on throughput and fairness from the exception handling are diminishing • Still the indirection table lookup imposes certain performance overhead on the throughput AES running with SPEC2000 INT University of Central Florida

Conclusions • Software cache-based side channel attacks are emerging threats • Cache misses are the source of information leakage • We proposed hardware-software integrated approaches to provide stronger security protection over various cache designs • A light-weight hardware support, informing loads, is proposed to protect regular caches with flexible software countermeasures and it incurs certain performance overhead • Preloading and informing loads are also proposed to enhance the security of previously proposed secure cache designs. University of Central Florida

Thank you! Questions? University of Central Florida

Jingfei Kong* University of Central Florida Onur Acıiçmez Samsung Electronics

Jingfei Kong* University of Central Florida Onur Acıiçmez Samsung Electronics

Presentation Transcript

Presented by Edward (Ted) M. Kian, Ph.D. University of Central Florida

IRB Presentation

Wanda Wade, M.Ed. Dr. Lee Cross Dr. Jennifer Platt University of Central Florida

Tammy Muhs General Education Program Mathematics Coordinator University of Central Florida

Onur Ergen

Onur Mutlu onur@cmu July 4, 2013 INRIA

* Memory Solutions Lab. (MSL) Memory Division, Samsung Electronics Co.

Presented by: Sanketh Beerabbi University of Central Florida

UNIVERSITY OF CENTRAL FLORIDA • ORLANDO

Developing Marketing Strategies and Plans

Euntae Won

Central Florida Citrus and Deciduous Fruit Crop Program

SAMSUNG Electronics SE Group

How much Structure is too Much?: Analysis of Structure in Asynchronous Discussion Boards

Jean Yves Kabore Central Florida Remote Sensing Laboratory University of Central Florida

UNIVERSITY OF CENTRAL FLORIDA TRANSIT TRACKING SYSTEM

UNIVERSITY OF CENTRAL FLORIDA TRANSIT TRACKING SYSTEM

Women in Hong Kong

Fast Approximation to Spherical Harmonics Rotation

Wanda Wade, M.Ed. Dr. Lee Cross Dr. Jennifer Platt University of Central Florida