240 likes | 367 Views
Karma: Scalable Deterministic Record-Replay. Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at University of Wisconsin-Madison. Executive summary. Applications of deterministic record-replay Debugging Fault tolerance Security Existing hardware record- replayer Fast record but
E N D
Karma:Scalable Deterministic Record-Replay ArkapravaBasu JayaramBobba Mark D. Hill Work done at University of Wisconsin-Madison
Executive summary • Applications of deterministic record-replay • Debugging • Fault tolerance • Security • Existing hardware record-replayer • Fast record but • Slow replay or • Requires major hardware changes • Karma: Faster Replay with nearly-conventional h/w • Extends Rerun • Records more parallelism
Outline • Background & Motivation • Rerun Overview • Karma Insights • Karma Implementation • Evaluation • Conclusion
Deterministic Record-Replay • Multi-threaded execution non-deterministic • Deterministic record-replay to reincarnate past execution • Record: • Record selective events in a log • Replay: • Use the log to reincarnate past execution • Key Challenge: Memory races
Record-Replay Motivation • Debugging • Ensures bugs faithfully reappear (no heisenbugs) • Fault-Tolerance • Enable hot backup for primary server toshadow primary & take over on failure • Security • Real time intrusion detection & attack analysis Replay speed matters
Previous work • Record Dependence • Wisconsin Flight Data Recorder [ISCA’03,etc.]: Too much state • UCSD Strata [ASPLOS’06]: Log size grows rapidly w #cores • Record Independence • UIUC DeLorean [ISCA’08]: Non-conventional BulkSC H/W • Wisconsin Rerun [ISCA’08]: Sequential replay • Intel MRR [MICRO’09]: Only for snoop based systems • Timetraveler [ISCA’10]: Extends Rerun to lower log size • Our Goal • Retain Rerun’s near-conventional hardware • Enable Faster Replay
Outline • Background & Motivation • Rerun Overview • Karma Insights • Karma Implementation • Evaluation • Conclusion
Rerun’s Recording • Most code executes without races • Use race-free regions for ordering • Episodes: independent execution regions • Defined per thread T0 T1 T2 ST V LD A ST E ST Z ST B LD B LD W ST C ST X LD J LD F LD R LD J LD X ST T LD V LD Q ST C ST Q ST E ST K ST X LD Z Partially adopted from ISCA’08 talk
Rerun’s Recording (Contd.) • Capturing causality: • Timestamp via Lamport scalar clock [Lamport ‘78] • Replay in timestamp order • Episodes with same timestamp can be replayed in parallel T0 T1 T2 60 43 22 61 23 23 44 44 62 45
Rerun’s Replay T0 T1 T2 22 TS=22 43 TS=43 44 44 TS=44 45 TS=45 TS=60 60 TS=61 61
Outline • Background & Motivation • Rerun Overview • Karma Insights • Karma Implementation • Evaluation • Conclusion
Karma’s Insight 1: • Capture order with DAG (not scalar clock) T0 T1 T2 Recording: DAG captured with episode predecessor & successor sets 60 43 22 61 23 23 44 44 62 45
Karma’s Insight 1: T0 T1 T2 T0 T1 T2 60 22 22 61 43 43 Karma’s Replay Rerun’s Replay 44 44 44 44 62 45 60 61
Karma’s Insight 1: (Contd.) • Naïve approach: DAG arcs point to episodes • Episode represented by integers • Too much log size overhead !! • Our approach:DAG arcs point to cores • Recording: Only one “active” episode per core • Replay: Send wakeup message(s) to core(s) of successor episode(s)
Karma’s Insight 1: T0 T1 T2 Anatomy of a log entry 60 22 61 43 84 0|0|1 0|0|1 44 44 62
Karma Insight 2: • Not necessary to end the episode on every conflict: • As long as the episodes can be ordered during replay T0 T1 T2 LD A ST V ST B ST E ST Z ST C LD B LD W ST X LD F LD J LD X LD R LD J LD Q ST T LD V ST Q ST C ST K ST E ST X LD Z
Outline • Background & Motivation • Rerun Overview • Karma Insights • Karma Implementation • Evaluation • Conclusion
Karma Hardware Data Tags Rerun L2/Memory State Directory Coherence Controller Base System Total State: 148 bytes/core L20 L2 1 L2 14 L2 15 … DRAM DRAM Interconnect Address Filter(FLT) Core 0 Core 1 … Core 14 Core 15 Reference (REFS) Predecessor(PRED) Coherence Controller Successor(SUCC) Timestamp(TS) L1 I L1 D Karma’s Per-Core State Pipeline
Outline • Background & Motivation • Rerun Overview • Karma Insights • Karma Implementation • Evaluation • Conclusion
Evaluation: • Were we able to speed up the replay?
Evaluation: • Were we able to speed up the replay? On Average ~4X improvement in replay speed over Rerun
Evaluation • Did we blowup log size? On average Karma does not increase the size of the log but instead improves it by as much as 40% as we allow larger episodes
Conclusion • Applications of deterministic replay • Debugging • Fault tolerance • Security • Existing hardware record-replayer • Slow replay or • Requires major hardware changes • Karma: Faster Replay with nearly-conventional h/w • Extends Rerun • Uses DAG instead of Scalar clock • Extend episodes past conflicts • Widen Application + Lower Cost More Attractive