1 / 20

Vulnerabilities on high-end processors

Vulnerabilities on high-end processors. André Seznec IRISA/INRIA CAPS project-team. A paradox. Microarchitectures are more and more complex Timing side channel attacks were presented on versions of AES (Bernstein) and RSA (Açiimez et al.).

hachi
Download Presentation

Vulnerabilities on high-end processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team

  2. A paradox • Microarchitectures are more and more complex • Timing side channel attacks were presented on versions of AES (Bernstein) and RSA (Açiimez et al.)

  3. Many hardware features only to improve performance • Caches • Pipeline • Superscalar execution • Branch prediction • Thread parallelism

  4. hit miss DTLB hit miss Branch Predictor ITLB hit miss hit miss Execution core D-cache Correct mispredict I-cache L2 Cache hit miss Execution time of a short instruction sequence is a complex function !

  5. Execution time of a short instruction sequence is a complex function (2) • Depends on the precise state of every microarchitecture component: • More than 100 speculative instructions inflight at the same time on a Pentium 4 • Instructions are executed out-of-order. • Strange correlations almost impredictable at compile time (even in the back-end compiler)

  6. Understanding AES cache timing attack on high end microprocessor (follows Bernstein2005) • AES with lookup tables is a 10 round algorithm with the following “vulnerabilties” • The number, the types and the order of the instructions are independent of the key K and the message M to be encrypted. • The exact locations of the data word read and written by the first round only depend on K xor M: • The execution time of the first round depends on K xor M (at least statistically) CAN BE EXPLOITED

  7. Bernstein 2005 (empty cache) • Plaintext attack • Irrealistic hypothesis: • Access to cycle-accurate encryption timing • Cache is flushed between two encryptions • Not explicit in the paper (but see Lauradoux et al.) • Byte by byte determination of the key based on statistically determining the maximum encryption time for each byte of K xor M • works only on Pentium 3, not on Pentium 4 

  8. A loaded cache attack (proof of concept codes available) • Plaintext attack: • Timing of large number of encryptions • An irrealistic hypothesis: • Access to cycle-accurate encryption timings On a byte basis of K xor M, determine bit subchains statistically leading to the highest encryption time (+ threshold to get confidence) Depending on microarchitectures: • 0 to 80 bits of the key recovered by this method depending on the model and stepping of Pentium 4 • Suspect exercising banking in the cache

  9. First vulnerability • For given sequence, • Timings are erratic: • Unlikely to get exactly the same timing • But statistically correlated: • cache banking, operation chaining appears in the average

  10. A possible counter measure for AES • Periodically and randomly change the mapping of the look up tables: • 9000 cycles for this change: XOR based permutation: • See Lauradoux et al • HAVEGE can provide the random numbers.

  11. Indirect timing measures ? • Hypothesis: • The attacker has access to user mode on the system (legal or illegal) • The attacker has no access to your data • He/she can run concurently its process with the encryption • On conventional systems, no access to microscopic timing of your application: • Time slice in 1,000,000s cycles

  12. Simultaneous Multithreading (SMT): parallel processing on a single processor • functional units are underused on superscalar processors • SMT: • Sharing the functional units on a superscalar processor between several process • Advantages: • Single process can use all the resources units • dynamic sharing of all structures on parallel/multiprocess workloads Second Vulnerability

  13. SMT Superscalar Issue slots

  14. Indirect timing measures on a SMT processor (principles) SPY wants to get information on CRYPT • SPY and CRYPT runs in parallel • SPY tracks a specific event on CRYPT: • For instance execution of a branch  • SPY saturates hardware resources needed for this event by CRYPT for fast execution • SPY records its own execution time (reading the hardware clock counter): • Irregurality in its own execution time signals the event: • CRYPT has try to grab the hardware resource

  15. Indirect timing measures on a SMTproof of concept (derived from SBPA) The skeleton of a naive RSA core For I =1 to N Sequence X // 1,000s of cycles If Key[I]=1 Sequence Y // 1,000s of cycles Endfor Spy this branch B

  16. Indirect timing measures on a SMTproof of concept (2) • Branch instructions are buffered in a BTB: • On Pentium 4, when the branch misses in the BTB, more than 20 cycles penalty • SPY: nearly infinite loop iterating on branching over a set of branches occupying the possible entries for B • Track irregularities in the timing of the loop: • When B is executed, a branch of the SPY is ejected from the BTB, thus creating a timing irregularity: • Iteration is X-type or XY-type Able to reproduce this attack on a toy example

  17. Indirect timing measures on a SMT • Feasible: • On a branch on Pentium4 HT, information is leaking: • I recovered all the bits of 32 bits key in a single run (on a toy example) • Same kind of attack may apply for cache access: memory access sequence could be discovered

  18. Feasible, but difficult • Technically, very difficult: • Lack of documentation on the BTB • Strange indexing, unknown associativity, BTB hierarchy • Requires relatively infrequent events: 1,000s cycles frequency: measure resolution is in the 100s cycles resolution

  19. So what ? • On Pentium 4 HT: • If key bits control branches (or addresses of loads): • Might be recovered by a spy thread

  20. Countermeasures • Just deactivate Hyperthreading. • At present that is a global OS mode (boot time) • Rework implementation: • Introduce randomness in control path at execution ? • Makes attack much more complex

More Related