220 likes | 315 Views
PRES: Probabilistic Replay with Execution Sketching on Multiprocessors Soyeon Park and Yuanyuan Zhou (UCSD) Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H.Lee and Shan Lu (UIUC) SOSP 2009. LBA Reading Group 9/15/09 Presented by: Michelle Goodstein. Outline. Motivation PRES Architecture
E N D
PRES: Probabilistic Replay with Execution Sketching on MultiprocessorsSoyeon Park and Yuanyuan Zhou (UCSD)Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H.Leeand Shan Lu (UIUC)SOSP 2009 LBA Reading Group 9/15/09 Presented by: Michelle Goodstein
Outline • Motivation • PRES Architecture • Capturing Sketches • Replaying Intelligently • Evaluation • Conclusion
Motivation Concurrency bugs are hard… Deterministic Replay can help, but… Deterministic Replay can be expensive What if only record partial information? • Good enough to reproduce bug vs actual execution • Reproduce bug in small (5-50) number of replays rather than first attempt?
PRES • Probabilistic Replay via Execution Sketching • Records partial ordering during production run • Intelligently explores space of partial orderings • Use feedback from failed attempts to reproduce bug in subsequent explorations
PRES Architecture • Sketch Recorders • Partial Information based Replayer (PI-Replayer) • Replay Recorder • Monitor • Feedback Generator
PRES Architecture • Sketch Recorders • During production run • Captures partial ordering of events • Balance of efficiency and usefulness in replay
PRES Architecture • Partial Information based Replayer (PI-Replayer) • During bug reproduction phase • Consults with sketch, feedback from attempts to reproduce bug • Sketch specifies ordering do what sketch proscribes • Feedback says ordering did not produce bug do something else • No info available – execute however desired
PRES Architecture • Replay Recorder • Deterministic replay recorder • Necessary to produce feedback • When bug reproduced, have a deterministic record of how to repeat with 100% probability
PRES Architecture • Monitor: • Tracks replays and detects: • Deviations from sketch (new replay necessary) • Bug reproduced (success!)
PRES Architecture • Feedback Generator • Uses info from recorder to provide feedback for future replay attempts • Try to figure out why bug not discovered
Sketch Recorders • Baseline (Base) • Everything necessary for det. replay on uniprocessor • Synchronization recorder (Sync) • Above + global order at high-level synch ops • System call recorder (Sys) • Above + global order of syscalls • Function call recorder (Func) • Global order of all function calls (Michelle: also + above???) • Nth-Basic block recorder (BB-n) • Records the nth basic block executed, (count is global) • Basic Block recorder (BB) • Global order of all basic blocks • Shared reads/writes (RW) • Standard deterministic replay
Replaying Intelligently • Monitor observes currently replay • Compares current replay to sketch to notice when to abort • Inconsistent or off-sketch • Bug reproduced • Operates only on visible events • Exceptions, timer signals, outputs
Replaying Intelligently • Unsuccessful replays • Sketches that are not RW miss some shared memory data races • If race occurs in certain orders, bug may not manifest • Idea: use info (feedback) from prior runs to guide choice of ordering in next replay attempt
Replaying Intelligently: Generating Feedback • Need to do full RW recording of replay attempt • Using failed replay recordings, identify data races • Filter out data races where sketch implies ordering • Select a data race to invert ordering of • Heuristic, chooses a replay recording and then the race closest to fault • On next replay, execute deterministically until data race encountered, flip order • Then, default PI-Replayer behavior takes over
Conclusion • Interesting use of partial orders as compromise between efficiency and replay • Partial information often sufficient to recover buggy ordering • Similarities to the CHESS paper presented earlier