200 likes | 312 Views
Vulnerabilities on high-end processors. André Seznec IRISA/INRIA CAPS project-team. A paradox. Microarchitectures are more and more complex Timing side channel attacks were presented on versions of AES (Bernstein) and RSA (Açiimez et al.).
E N D
Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team
A paradox • Microarchitectures are more and more complex • Timing side channel attacks were presented on versions of AES (Bernstein) and RSA (Açiimez et al.)
Many hardware features only to improve performance • Caches • Pipeline • Superscalar execution • Branch prediction • Thread parallelism
hit miss DTLB hit miss Branch Predictor ITLB hit miss hit miss Execution core D-cache Correct mispredict I-cache L2 Cache hit miss Execution time of a short instruction sequence is a complex function !
Execution time of a short instruction sequence is a complex function (2) • Depends on the precise state of every microarchitecture component: • More than 100 speculative instructions inflight at the same time on a Pentium 4 • Instructions are executed out-of-order. • Strange correlations almost impredictable at compile time (even in the back-end compiler)
Understanding AES cache timing attack on high end microprocessor (follows Bernstein2005) • AES with lookup tables is a 10 round algorithm with the following “vulnerabilties” • The number, the types and the order of the instructions are independent of the key K and the message M to be encrypted. • The exact locations of the data word read and written by the first round only depend on K xor M: • The execution time of the first round depends on K xor M (at least statistically) CAN BE EXPLOITED
Bernstein 2005 (empty cache) • Plaintext attack • Irrealistic hypothesis: • Access to cycle-accurate encryption timing • Cache is flushed between two encryptions • Not explicit in the paper (but see Lauradoux et al.) • Byte by byte determination of the key based on statistically determining the maximum encryption time for each byte of K xor M • works only on Pentium 3, not on Pentium 4
A loaded cache attack (proof of concept codes available) • Plaintext attack: • Timing of large number of encryptions • An irrealistic hypothesis: • Access to cycle-accurate encryption timings On a byte basis of K xor M, determine bit subchains statistically leading to the highest encryption time (+ threshold to get confidence) Depending on microarchitectures: • 0 to 80 bits of the key recovered by this method depending on the model and stepping of Pentium 4 • Suspect exercising banking in the cache
First vulnerability • For given sequence, • Timings are erratic: • Unlikely to get exactly the same timing • But statistically correlated: • cache banking, operation chaining appears in the average
A possible counter measure for AES • Periodically and randomly change the mapping of the look up tables: • 9000 cycles for this change: XOR based permutation: • See Lauradoux et al • HAVEGE can provide the random numbers.
Indirect timing measures ? • Hypothesis: • The attacker has access to user mode on the system (legal or illegal) • The attacker has no access to your data • He/she can run concurently its process with the encryption • On conventional systems, no access to microscopic timing of your application: • Time slice in 1,000,000s cycles
Simultaneous Multithreading (SMT): parallel processing on a single processor • functional units are underused on superscalar processors • SMT: • Sharing the functional units on a superscalar processor between several process • Advantages: • Single process can use all the resources units • dynamic sharing of all structures on parallel/multiprocess workloads Second Vulnerability
SMT Superscalar Issue slots
Indirect timing measures on a SMT processor (principles) SPY wants to get information on CRYPT • SPY and CRYPT runs in parallel • SPY tracks a specific event on CRYPT: • For instance execution of a branch • SPY saturates hardware resources needed for this event by CRYPT for fast execution • SPY records its own execution time (reading the hardware clock counter): • Irregurality in its own execution time signals the event: • CRYPT has try to grab the hardware resource
Indirect timing measures on a SMTproof of concept (derived from SBPA) The skeleton of a naive RSA core For I =1 to N Sequence X // 1,000s of cycles If Key[I]=1 Sequence Y // 1,000s of cycles Endfor Spy this branch B
Indirect timing measures on a SMTproof of concept (2) • Branch instructions are buffered in a BTB: • On Pentium 4, when the branch misses in the BTB, more than 20 cycles penalty • SPY: nearly infinite loop iterating on branching over a set of branches occupying the possible entries for B • Track irregularities in the timing of the loop: • When B is executed, a branch of the SPY is ejected from the BTB, thus creating a timing irregularity: • Iteration is X-type or XY-type Able to reproduce this attack on a toy example
Indirect timing measures on a SMT • Feasible: • On a branch on Pentium4 HT, information is leaking: • I recovered all the bits of 32 bits key in a single run (on a toy example) • Same kind of attack may apply for cache access: memory access sequence could be discovered
Feasible, but difficult • Technically, very difficult: • Lack of documentation on the BTB • Strange indexing, unknown associativity, BTB hierarchy • Requires relatively infrequent events: 1,000s cycles frequency: measure resolution is in the 100s cycles resolution
So what ? • On Pentium 4 HT: • If key bits control branches (or addresses of loads): • Might be recovered by a spy thread
Countermeasures • Just deactivate Hyperthreading. • At present that is a global OS mode (boot time) • Rework implementation: • Introduce randomness in control path at execution ? • Makes attack much more complex