1 / 28

Execution Replay for Multiprocessor Virtual Machines

Execution Replay for Multiprocessor Virtual Machines. George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen. Big ideas. Detection and replay of memory races is possible on commodity hardware Overhead high for some workloads …but surprisingly low for other workloads.

signa
Download Presentation

Execution Replay for Multiprocessor Virtual Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen

  2. Big ideas • Detection and replay of memory races is possible on commodity hardware • Overhead high for some workloads • …but surprisingly low for other workloads

  3. Execution Replay CPU Interrupts Network Memory Keyboard, mouse Disk

  4. Uses of Execution Replay • Reconstructing state • Fault tolerance • Reconstructing execution • Debugging • Realistic trace generation • Both • Intrusion analysis

  5. Single-processor Replay • Basic principles well understood • Log all non-deterministic inputs • Timing of asynchronous events • Minimal overhead (Dunlap02) • 13% worst case • Log for months or years • Available commercially • VMWare: Record/Replay

  6. Replay for Multiprocessors • Memory races in multiprocessor VMs • The Ordering Requirement • The CREW Protocol • Implementing with page protections • Relation to the Ordering Requirement • Generating constrants from CREW events • DMA-capable devices and CREW • Performance

  7. The Multiprocessor Challenge • Interleaved reads and writes • Fine-grained non-determinism • Much more difficult • Existing solutions • Hardware modification • Software instrumentation • SMP-ReVirt • Hardware MMU to detect sharing

  8. Multiprocessor Replay P2 P1 P1 P2 n=5 n=3 Memory if (n<4)

  9. Ordering Memory Accesses • Preserving order will reproduce execution • a→b: “a happens-before b” • Ordering is transitive: a→b, b→c means a→c • Two instructions must be ordered if: • they both access the same memory, and • one of them is a write

  10. To guarantee a→d: a→d b→d a→c b→c Suppose we need b→c b→c is necessary a→d is redundant Constraints: Enforcing order P1 P2 a b overconstrained c d

  11. CREW Protocol • Each shared object in one of two states: • Concurrent-Read:all processors can read, none can write • Exclusive-Write: one processor (the owner) can read and write; others have no access

  12. CREW protocol, con’t • Enforced with hardware MMU • Read/write • Read-only • None • Change CREW states on demand • Fault, fixup, re-execute • CREW event • Increasing or reducing permission due to CREW state changes

  13. CREW Property • If two instructions on different processors: • access the same page, • and one of them is a write, • there will be a CREW event on each processor between them.

  14. Generating Constraints • State: Concurrent Read • All processors read-only • d*: CREW fault • New state: P2 Exclusive • r: privilege reduction • Read to None • i: privilege increase • Read to Read/write • Log timing of r and i • Constraint: • r → i P1 P2 a d* r i d

  15. Direct Memory Access • Device accesses memory directly • Logically another processor • Reads and writes need to be ordered • IOMMU: can’t fault/fixup/re-execute • Observation: Transaction model • Device: non-preemptible actor

  16. Prototype: SMP-ReVirt • Modified Xen hypervisor • Implement logging, CREW protocol • Details in paper

  17. Evaluation questions • What is the overhead? • What affects performance? • In paper • When might I want to use MP? • Log with 1, 2, or N cpus?

  18. Evaluation Workloads • SPLASH2 parallel application suite • FMM, LU, ocean, radix, water-spatial, radiosity • Kernel-build • Dbench

  19. Predicting results • Key changes in sharing attributes • 4096-byte sharing granularity • “Miss” is very expensive • SPLASH2 • Good: high spatial locality / low false sharing • Bad: random access patterns / high false sharing • The Linux kernel • Tuned to 16-byte cacheline • Involving the kernel may be expensive

  20. Single-processor Xen guests

  21. Log Growth Rate

  22. 2-processor Xen guests

  23. 2-processor, con’t

  24. Log Growth Rate

  25. 4-processor Xen guests

  26. Recap • Memory races in multiprocessor VMs • The Ordering Requirement • The CREW Protocol • Implementing with page protections • Relation to the Ordering Requirement • Generating constrants from CREW events • DMA-capable devices and CREW • Performance

  27. Big ideas • Detection and replay of memory races is possible on commodity hardware • Overhead high for some workloads • …but surprisingly low for other workloads

  28. Questions

More Related