600 likes | 756 Views
Practical Data Confinement. Andrey Ermolinskiy, Sachin Katti, Scott Shenker, Lisa Fowler, Murphy McCauley. Introduction. Controlling the flow of sensitive information is one of the central challenges in managing an organization Preventing exfiltration (theft) by malicious entities
E N D
Practical Data Confinement Andrey Ermolinskiy, Sachin Katti, Scott Shenker, Lisa Fowler, Murphy McCauley
Introduction • Controlling the flow of sensitive information is one of the central challenges in managing an organization • Preventing exfiltration (theft) by malicious entities • Enforcing dissemination policies
Why is it so hard to secure sensitive data? • Modern software is rife with security holes that can be exploited for exfiltration
Why is it so hard to secure sensitive data? • Modern software is rife with security holes that can be exploited for exfiltration • Users must be trusted to remember, understand, and obey dissemination restrictions • In practice, users are careless and often inadvertently allow data to leak • E-mail sensitive documents to the wrong parties • Transfer data to insecure machines and portable devices
Our Goal • Develop a practical data confinement solution
Our Goal • Develop a practical data confinement solution • Key requirement: compatibility with existing infrastructure and patterns of use • Support current operating systems, applications, and means of communication • Office productivity apps: word processing, spreadsheets, … • Communication: E-mail, IM, VoIP, FTP, DFS, … • Avoid imposing restrictions on user behavior • Allow access to untrusted Internet sites • Permit users to download and install untrusted applications
Our Assumptions and Threat Model • Users • Benign, do not intentionally exfiltrate data • Make mistakes, inadvertently violate policies • Software platform (productivity applications and OS) • Non-malicious, does not exfiltrate data in pristine state • Vulnerable to attacks if exposed to external threats • Attackers • Malicious external entities seeking to exfiltrate sensitive data • Penetrate security barriers by exploiting vulnerabilities in the software platform
Central Design Decisions • Policy enforcement responsibilities • Cannot rely on human users • The system must track the flow of sensitive information, enforce restrictions when the data is externalized
Central Design Decisions • Policy enforcement responsibilities • Cannot rely on human users • The system must track the flow of sensitive information, enforce restrictions when the data is externalized • Granularity of information flow tracking (IFT) • Need fine-grained byte-level tracking and policy enforcement to prevent accidental partial exfiltrations
Central Design Decisions • Placement of functionality • PDC inserts a thin software layer (hypervisor) between the OS and hardware • The hypervisor implements byte-level IFT and policy enforcement • A hypervisor-level solution • Retains compatibility with existing OSes and applications • Has sufficient control over hardware
Central Design Decisions • Placement of functionality • PDC inserts a thin software layer (hypervisor) between the OS and hardware • The hypervisor implements byte-level IFT and policy enforcement • A hypervisor-level solution • Retains compatibility with existing OSes and applications • Has sufficient control over hardware • Resolving tension between safety and user freedom • Partition the application environment into two isolated components: a “Safe world” and a “Free world”
Partitioning the User Environment Safe Virtual Machine Unsafe Virtual Machine Access to sensitive data Unrestricted communication and execution of untrusted code Hypervisor IFT, policy enforcement Hardware (CPU, Memory, Disk, NIC, USB, Printer, …)
Partitioning the User Environment Sensitive data Non-sensitive data Trusted code/data Exposure to the threat of exfiltration Untrusted (potentially malicious) code/data
PDC Use Cases • Logical “air gaps” for high-security environments • VM-level isolation obviates the need for multiple physical networks • Preventing information leakage via e-mail • “Do not disseminate the attached document” • Digital rights management • Keeping track of copies; document self-destruct • Auto-redaction of sensitive content
Talk Outline • Introduction • Requirements and Assumptions • Use Cases • PDC Architecture • Prototype Implementation • Preliminary Performance Evaluation • Current Status and Future Work
PDC Architecture: Hypervisor • PDC uses an augmented hypervisor to • Ensure isolation between safe and unsafe VMs • Tracks the propagation of sensitive data in the safe VM • Enforces security policy at exit points • Network I/O, removable storage, printer, etc.
PDC Architecture: Tag Tracking in the Safe VM • PDC associates an opaque 32-bit sensitivity tag with each byte of virtual hardware state • User CPU registers accessible • Volatile memory • Files on disk
PDC Architecture: Tag Tracking in the Safe VM • These tags are viewed as opaque identifiers • The semantics can be tailored to fit the specific needs of administrators/users • Tags can be used to specify • Security policies • Levels of security clearance • High-level data objects • High-level data types within an object
PDC Architecture: Tag Tracking in the Safe VM • An augmented x86 emulator performs fine-grained instruction-level tag tracking (current implementation is based on QEMU) • PDC tracks explicit data flows (variable assignments, arithmetic operations) eax add %eax, %ebx ebx
PDC Architecture: Tag Tracking in the Safe VM • An augmented x86 emulator performs fine-grained instruction-level tag tracking (current implementation is based on QEMU) • PDC also tracks flows resulting from pointer dereferencing eax Tag merge mov %eax, %(ebx) ebx Memory
Challenges • Tag storage overhead in memory and on disk • Naïve implementation would incur a 400% overhead • Computational overhead of online tag tracking • Tag explosion • Tag tracking across pointer exacerbates the problem • Tag erosion due to implicit flows • Bridging the semantic gap between application data units and low-level machine state • Impact of VM-level isolation on user experience
Talk Outline • Introduction • Requirements and Assumptions • Use Cases • PDC Architecture • Prototype Implementation • Storing sensitivity tags in memory and on disk • Fine-grained tag tracking in QEMU • “On-demand” emulation • Policy enforcement • Performance Evaluation • Current Status and Future Work
Policy daemon PDC Implementation: The Big Picture Dom 0 Safe VM QEMU / tag tracker App1 App2 Safe VM (emulated) PageTag Descriptors NIC Network daemon VFS PDC-ext3 NFS Server NFS Client Xen-RPC Xen-RPC Event channel Shadow page tables Safe VM page tables Shared ring buffer PageTag Mask PDC-Xen (ring 0) CR3 CPU
Storing Tags in Volatile Memory • PDC maintains a 64-bit PageTagSummary for each page of machine memory • Uses a 4-level tree data structure to keep PageNumberPageTagSummary mappings 31 29 19 9 0 PageNumber Array of 64-bit PageTagSummarystructures
Storing Tags in Volatile Memory Page-wide tag for uniformly-tagged pages PageTagSummary • PageTagDescriptor stores fine-grained (byte-level) tags within a page in one of two formats Pointer to a PageTagDescriptor otherwise Linear array of tags (indexed by page offset) PageTagDescriptor RLE encoding
Storing Tags on Disk • PDC-ext3 provides persistent storage for the safe VM • New i-node field for file-level tags • Leaf indirect blocks store pointers to BlockTagDescriptors • BlockTagDescriptor byte-level tags within a block Data block i-node FileTag Linear array Leaf Ind. block BlockTagDescriptor Ind. block RLE
Policy daemon Back to the Big Picture Dom0 Safe VM QEMU / tag tracker App1 App2 Emul. CPU Context Safe VM (emulated) NIC Network daemon VFS PDC-ext3 NFS Server NFS Client Xen-RPC Xen-RPC Event channel Shadow page tables Safe VM page tables Shared ring buffer PageTag Mask PDC-Xen (ring 0) CR3 CPU
Guest machine codeblock (x86) Fine-Grained Tag Tracking • A modified version of QEMU emulates the safe VM and tracks movement of sensitive data • QEMU relies on runtime binary recompilation to achieve reasonably efficient emulation • We augment the QEMU compiler to generate a tag tracking instruction stream from the input stream of x86 instructions Intermediate representation (TCG) stage 2 Host machine code block (x86) stage 1 Tag tracking code block
Fine-Grained Tag Tracking • Tag tracking instructions manipulate the tag status of emulated CPU registers and memory Basic instruction format Action Dest. Operand Src. Operand {Clear, Set, Merge} {Reg, Mem} {Reg, Mem} • The tag tracking instruction stream executes asynchronously in a separate thread
Fine-Grained Tag Tracking • Problem: some of the instruction arguments are not known at compile time • Example: mov %eax,(%ebx) • Source memory address is not known • The main emulation thread writes the values of these arguments to a temporary log (a circular memory buffer) at runtime • The tag tracker fetches unknown values from this log
Binary Recompilation (Example) Input x86 instructions Intermediate representation Tag tracking instructions mov %eax, $123 movi_i32 tmp0,$123 Clear4 eax st_i32 tmp0,env,$0x0 push %ebp ld_i32 tmp0,env,$0x14 Set4 mem,ebp,0 ld_i32 tmp2,env,$0x10 Merge4 mem,esp,0 movi_i32 tmp14, $0xfffffffc add_i32 tmp2,tmp2,tmp14 qemu_st_logaddr tmp0,tmp2 st_i32 tmp2,env,$0x10 MachineAddr(%esp) Tag tracking argument log
Binary Recompilation • But things get more complex… • Switching between operating modes (Protected/real/virtual8086, 16/32bit)
Binary Recompilation • But things get more complex… • Switching between operating modes (Protected/real/virtual8086, 16/32bit) • Recovering from exceptions in the middle of a translation block
Binary Recompilation • But things get more complex… • Switching between operating modes (Protected/real/virtual8086, 16/32bit) • Recovering from exceptions in the middle of a translation block • Multiple memory addressing modes
Binary Recompilation • But things get more complex… • Switching between operating modes (Protected/real/virtual8086, 16/32bit) • Recovering from exceptions in the middle of a translation block • Multiple memory addressing modes • Repeating instructions rep movs
Binary Recompilation • But things get more complex… • Switching between operating modes (Protected/real/virtual8086, 16/32bit) • Recovering from exceptions in the middle of a translation block • Multiple memory addressing modes • Repeating instructions rep movs • Complex instructions whose semantics are partially determined by the runtime state saved SS saved ESP saved EFLAGS saved CS iret saved EIP
Policy daemon Back to the Big Picture Dom0 Safe VM QEMU / tag tracker App1 App2 Emul. CPU Context Safe VM (emulated) NIC Network daemon VFS PDC-ext3 NFS Server NFS Client Xen-RPC Xen-RPC Event channel Shadow page tables Safe VM page tables Shared ring buffer PageTag Mask PDC-Xen (ring 0) CR3 CPU
“On-Demand” Emulation • During virtualized execution, PDC-Xen uses the paging hardware to intercept sensitive data access • Maintains shadow page tables, in which all memory pages containing tagged data are marked as not present QEMU / tag tracker PageTag Descriptors PageTag Mask • Access to a tagged page from the safe VM causes a page fault and transfer of control to the hypervisor Shadow page tables Safe VM page tables PDC-Xen (ring 0)
“On-Demand” Emulation • If the page fault is due to tagged data, PDC-Xen suspends the guest domain and transfers control to the emulator (QEMU) • QEMU initializes the emulated CPU context from the native processor context (saved upon entry to the page fault handler) and resumes the safe VM in emulated mode Dom0 Safe VM QEMU / tag tracker Access to a tagged page Safe VM memory mappings Emul. SafeVM CPU Page fault handler Dom0 VCPU Dom0 Memory SafeVM VCPU Safe VM Memory
“On-Demand” Emulation • Returning from emulated execution • QEMU terminates the main emulation loop, waits for the tag tracker to catch up • QEMU then makes a hypercall to PDC-Xen and provides • Up-to-date processor context for the safe VM VCPU • Up-to-date PageTagMask
“On-Demand” Emulation • Returning from emulated execution • QEMU terminates the main emulation loop, waits for the tag tracker to catch up • QEMU then makes a hypercall to PDC-Xen and provides • Up-to-date processor context for the safe VM VCPU • Up-to-date PageTagMask • The hypercall awakens the safe VM VCPU (blocked in the page fault handler) • The page fault handler • Overwrites the call stack with up-to-date values of CS/EIP, SS/ESP, EFLAGS • Restores other processor registers • Returns control to the safe VM
“On-Demand” Emulation - Challenges • Updating PTEs in read-only page table mappings • Solution: QEMU maintains local writable “shadow” copies, synchronizes them in background via hypercalls
“On-Demand” Emulation - Challenges • Updating PTEs in read-only page table mappings • Solution: QEMU maintains local writable “shadow” copies, synchronizes them in background via hypercalls • Transferring control to the hypervisor during emulated execution (hypercall and fault handlers) • Emulating hypervisor-level code is not an option • Solution: Transient switch to native execution • Resume native execution at the instruction that causes a jump to the hypervisor (e.g., int 0x82 for hypercalls)
“On-Demand” Emulation - Challenges • Delivery of timer interrupts (events) in emulated mode • The hardware clock advances faster in the emulated context (i.e., each instruction consumes more clock cycles) • Xen needs to scale the delivery of timer events accordingly
“On-Demand” Emulation - Challenges • Delivery of timer interrupts (events) in emulated mode • The hardware clock advances faster in the emulated context (i.e., each instruction consumes more clock cycles) • Xen needs to scale the delivery of timer events accordingly • Use of the clock cycle counter (rdtsc instruction) • Linux timer interrupt/event handler uses the clock cycle counter to estimate timer jitter • After switching from emulated to native execution, the guest kernel observes a sudden jump forward in time
Policy Enforcement • The policy controller module • Resides in dom0 and interposes between the front-end and the back-end device driver • Fetches policies from a central policy server • Looks up the tags associated with the data in shared I/O request buffers and applies policies Dom0 Safe VM Netw. interface back-end Netw. Interface front-end Block storage back-end Block storage front-end Policy controller
Network Communication • PDC annotates outgoing packets with PacketTagDescriptors, carrying the sensitivity tags • Current implementation transfers annotated packets via a TCP/IP tunnel EthHdr IPHdr TCPHdr Payload Annotation TCP/IP encapsulation EthHdr IPHdr TCPHdr Tags EthHdr IPHdr TCPHdr Payload
Talk Outline • Introduction • Requirements and Assumptions • Use Cases • PDC Architecture • Prototype Implementation • Preliminary Performance Evaluation • Application-level performance overhead • Filesystem performance overhead • Network bandwidth overhead • Current Status and Future Work