240 likes | 499 Views
Efficient Memory Shadowing for 64-bit Architectures. Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT). ISMM 2010, Toronto, Canada June 6, 2010. Dynamic Program Analysis. Understand Program Behavior Optimization Debugging Security Memory management Shadow Memory Tools
E N D
Efficient Memory Shadowing for 64-bit Architectures Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) ISMM 2010, Toronto, Canada June 6, 2010
Dynamic Program Analysis • Understand Program Behavior • Optimization • Debugging • Security • Memory management • Shadow Memory Tools • Maintain meta-data for every memory location • Update meta-data on every memory operation
Examples • Memory Error Detection • MemCheck[VEE’07] • Purify [USENIX’92] • Dr. Memory • Dynamic Information Flow Tracking • LIFT [MICRO’39] • TaintTrace[ISCC’06] • Multi-threaded Program Analysis • Eraser [TCS’97] • Helgrind • Memory Usage Analysis • CETS[ISMM’10] • Staleness
Shadow Memory System • Shadow Memory Manager • Meta-data for application memory • Memory mapping scheme (addrA addrS) • DMS (Direct Mapping) • SMS (Segmented Mapping) • Instrumentor • Every memory operation • Address calculation • Meta-data update • Expensive • MemCheck (~25x) • ~12x for addrA addrS a.out a.out heap heap libc libc stack stack Application Memory Shadow Memory
Direct Mapping Scheme (DMS) • Single memory region for entire address space. • Translation: • Issue: address conflict between memAand memS lea [addr] %r1 add %r1disp %r1 Application Shadow Slowdown relative to native execution
Segmented Mapping Scheme (SMS) • Shadow segment per application segment • Translation: • Segment lookup (address indexing) • Address translation App 1 lea [addr] %r1 mov %r1 %r2 shr %r2, 16 %r2 add %r1, disp[%r2] %r1 addrA Shd 2 Shd 1 Slowdown relative to native execution addrS App 2 Segment table
Shadow Memory Mapping • Scaling to 64-bit Architecture • DMS • Infeasible due to memory layout a.out User space 247 stack Unusable space Kernel space 264 vsyscall
Shadow Memory Mapping addrA • Scaling to 64-bit Architecture • DMS • Infeasible due to memory layout • Single-Level SMS • Too big (~4 billion entries)
Shadow Memory Mapping addrA • Scaling to 64-bit Architecture • DMS • Infeasible due to memory layout • Single-Level SMS • Too big (~4 billion entries) • Multi-Level SMS • Even more expensive Slowdown relative to native execution
Umbra (CGO’10) • Scaling to 64-bit Architecture • Single-Level SMS is too big but sparse • Umbra (CGO’10) • Eliminate empty entries • Compact table • Walk the table to find the entry
Umbra (CGO’10) • Reference Uni-Cache • Software cache per instr per thread • Segment tag & displacement • Check uni-cache before table walk • 99.97% hit ratio tag = addrA & mask; if (cachetag != tag) { … // table walk} addrS = addrA + cachedisp Slowdown relative to native execution
EMS64: Key Idea • Umbra • EMS64 • Speculatively use a disp without check • Smart shadow memory placement • Notified by memory access violation fault for incorrect displacement tag = addrA & mask; if (cachetag != tag) { … // table walk (0.03%)} addrS = addrA + cachedisp
EMS64: Example 0: Application A0 2: Shadow S0 6: Shadow S1 7: Application A1 9: Reserved 10: Shadow S2 11: Application A2 12: Unavailable 13: Unavailable 13: Unavailable/Reserved 14: Unavailable 15: Unavailable 15: Unavailable/Reserved Displacement: {-1, 2}
EMS64: Potential Problem 0: Application A0 2: Shadow S0 6: Shadow S1 7: Application A1 9: Reserved 10: Shadow S2 11: Application A2 12: Unavailable 13: Unavailable/Reserved 14: Unavailable 15: Unavailable/Reserved Displacement: {-1, 2}
EMS64: Final Solution 0: Application A0 1: Reserved 2: Shadow S0 4: Reserved 5: Reserved 6: Shadow S1 7: Application A1 8: Reserved 9: Reserved 10: Shadow S2 11: Application A2 12: Unavailable 12: Unavailable/Reserved 13: Unavailable/Reserved 14: Unavailable 15: Unavailable/Reserved Displacement: {-1, 2}
Slot Finding Problem • Given n slots: • k Application slots • x Empty slots • y Reserved slots • Find k S-slots. • For each slot Ai, there is one associated slot S with displacement di where di = Si - Ai. • For each slot Ai and each existing displacement dj where di≠dj, slot ((Ai + dj) mod n) is an R-slot or an E-slot. • For each slot S and any existing valid displacement di slot, slot ((S + di) mod n) is an R-slot or an E-slot. Application slot Ai Si Shadow slot Ei Empty slot Ri Reserved slot A0 A1 E0 E1 E2 E3 E4 R0 R1 R2 S0 S1
Slot Finding Problem • Given n slots: • k Application slots • x Empty slots • y Reserved slots • Can We Find k S-slots? • Depends on layout! • Guarantee to find it, for 48-bit address space, if • Application memory < 250 GB • Proof • x ≥ 8k2+2k+1 • We can always find an Si for Ai if #E-slot > #conflicts Application slot Ai Si Shadow slot Ei Empty slot Ri Reserved slot
Implementation & Optimization • Implementation • Shadow memory allocation • Add signal handler • Remove reference uni-cache check • Optimization • Restore uni-cache checks for instructions that access multiple segments, e.g., references from memcpy • When number of access violation exceed 2 lea [addr] %r1 add %r1, unicachedisp %r1
Experimental Results Slowdown relative to native execution
Thank You • Download • http://people.csail.mit.edu/qin_zhao/umbra/ • Q & A