230 likes | 477 Views
ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. Jae Wook Kim Distributed Computing Systems Laboratory 2005.11.28 Additional Content Created by Jeremy Dobler, Michael Joyal, and Michael Pease. Introduction. It is infeasible to prevent all attacks on a system
E N D
ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay Jae Wook Kim Distributed Computing Systems Laboratory 2005.11.28 Additional Content Created by Jeremy Dobler, Michael Joyal, and Michael Pease
Introduction • It is infeasible to prevent all attacks on a system • Most computer systems try to enable attack analysis by logging various events. • Login/logoff events • Mail processing • TCP connection requests • File accesses • Security policy changes • Logs can be analyzed to understand the attack, fix the vulnerability, and repair any damages. • Logs provided by current systems fall short in two ways of what is needed: integrity and completeness
Goals of ReVirt • Integrity • Standard loggers assume the OS kernel is trustworthy • Logs stored in local file system • Attacker's first move is to subvert the logs • Delete or modify, or at least disable • Damages are difficult to assess and repair if logs have been compromised • “It is ironic that current loggers work best when the kernel is not compromised, since the audit logs are intended to be used when the system has been compromised.” • ReVirt assumes the kernel is untrustworthy: • Encapsulating the target system inside a virtual machine and place the logging software beneath this virtual machine. • Running the logger in a different domain than the target system protects the logger from a compromised application or operating system.
Completeness • Standard loggers do not log sufficient information to recreate or understand all attacks • Only a narrow category of system events are logged • Still require lots of educated guesses • Can't account for non-determinism, which many attacks exploit • i.e. encryption algorithms • ReVirt: • Adapts techniques used in fault-tolerance for primary-backup recovery, such as checkpointing, logging, and roll-forward recovery. • Able to replay the complete, instruction-by-instruction execution of the virtual machine, even if it relies on non-deterministic events Goals of ReVirt
Virtual Machines • The VMM makes a much better trusted computing base than the guest operating system, due to its narrow interface and small size. • The narrow VMM interface restricts the actions of an attacker.
UMLinux • Virtual machine used by ReVirt • The guest OS in UMLinux runs on top of the host OS and uses host services as the interface to peripheral devices. (OS-on-OS) • Guest OS and all applications run within a single host process • Compare to Direct-on-Host • Target applications run directly on the host operating system • VMM is implemented as a loadable module in the host kernel.
UMLinux Address Space • Host kernel occupies [0xc0000000, 0xffffffff] • Host user occupies [0x0, 0xc0000000] • Guest kernel occupies [0x70000000, 0xc0000000] • Current guest application occupies [0x0, 0x70000000]
Trusted computing base for UMLinux • The trusted computing base (TCB) is composed of the VMM kernel module and the host OS. • Logging in an OS-on-OS structure is much more difficult to attack than the logging in a direct-on-host structure, because the TCB for an OS-on-OS structure can be much smaller than the complete host operating system.
Attacks against host OS • From above by causing application processes to invoke the host OS in dangerous ways • Direct-on-Host: Attacker has complete freedom to invoke whatever functionality the host OS makes available to user processes • Os-on-Os : Attacker who has gained control of all application processes can use these same avenues to attack the guest OS • Low level of the network protocol stack by sending dangerous network packets to the host • Direct-on-Host: Packets traverse through the entire network stack and are delivered to applications • Os-on-Os: Packets need only traverse a small part of the network stack
Logging and replaying UMLinux • Logging is used widely for recovering state. • Basic concept • Start from a checkpoint of a prior state • Roll forward using the log to reach the desired state • Replaying a process requires logging the non-deterministic events that affect the process’s computation. • Time: the exact point in the execution stream at which an event takes place • External input: data received from a non-logged entity • Output to peripherals is not logged as it will be automatically recreated
Logging in ReVirt • Logs • All non-deterministic events that can affect the execution of the virtual-machine process • Asynchronous virtual interrupts • All input from external entities • Keyboard, mouse • NIC • CD-ROM • Does not log input from virtual hard disk (will be recreated during replay) • During replay, ReVirt prevents new asynchronous virtual interrupts from perturbing the replaying virtual machine process. • Create a checkpoint before starting UMLinux by making a copy of its virtual disk
Cooperative logging • Of all the sources of non-determinism, only received network messages have the potential to generate enormous quantities of log data. • If the sending computer is being logged via ReVirt, then the receiver need not log the message data because the sender can re-create the sent data via replay. • Can reduce log volume, but complicates replay and requires that cooperating computers trust each other to regenerate the same message data during replay • Not yet implemented in ReVirt
Direct-on-host logging • Host kernel logs and replays all its host processes • Not as secure and much more difficult than a virtual-machine approach • DoH involves multiple host processes while an OoO approach involves only a sing host process • Replaying multiple host processes can be done in 2 ways • Replay communication channels between processes • Replaying shared-memory communication channel requires complex instrumentation of the executing code and adds significant overhead • Replay the scheduling order between host processes • Difficult because host process can be interrupted while executing in kernel mode • Hard to identify the point where an interrupt occurred.
Using ReVirt to analyze attacks • ReVirt enables an administrator to replay the complete execution of a computer before, during, and after the attack. • Two types of tools to assist the administrator to understand the attack. • Inside the guest virtual machine • ReVirt supports the ability to continue live execution at any point in the replay. • Use this ability to run new guest commands to probe the virtual machine state. • Virtual machine cannot switch back to replaying after being perturbed in this manner. • Can be resolved by creating a checkpoint before doing this, otherwise replay would have to be restarted • Outside the guest virtual machine • Debuggers and disk analyzers • Do not depend on the guest kernel or guest applications.
Experiments • AMD Athlon 1800+ IDE, 256 MB, Samsung SV4084 IDE • Host & Guest kernel: modified Linux 2.4.18 • 5 workloads • POV-Ray • CPU-intensive ray-tracing program • kernel-build • NFS kernel-build • SPECweb99 • benchmark to measure web server performance • Desktop machine
Virtualization overhead • Time overhead that arises from running all applications in the UMLinux virtual machine • Compare running all applications within UMLinux with running them directly on a host Linux 2.4.18. • Results • Very little overhead for compute-intensive POV-Ray • No overhead for interactive jobs such as e-mail • Others are higher because they issue more guest kernel calls
Validating ReVirt correctness • Verify that the ReVirt system faithfully replays the exact execution of the original run • Add extensive error checking to alert if the replaying run deviates from the original • 2 micro-benchmarks • Runs 2 guest processes that share an mmap’ed memory region • Runs a single process that increments a variable in an infinite loop • 1 macro-benchmark • Boot computer, start the GNOME window manager, open several interactive terminal windows, and concurrently build two applications on a remote NFS server
Logging and replaying overhead • Quantify the space and time overhead of logging • Time overhead of logging is small (at most 8%) • Space overhead of logging is small enough to save logs over a long period of time at low cost • 120 GB disk can store the volume of log traffic generated by NFS kernel-build for 3-4 months
Conclusions • ReVirt applies virtual-machine and fault-tolerance techniques to enable a system administrator to replay the long-term, instruction-by-instruction execution of a computer system. • Because the target operating system and target applications run within a virtual machine, ReVirt can replay the execution before, during, and after the intruder compromises the system. • Because ReVirt can replay instruction-by-instruction sequences, it can provide arbitrarily detailed observations about what transpired on the system.
Cannot replay the fine-grained interleaving order of memory operations in a shared-memory multiprocessor The TCB of UMLinux, while smaller than the host OS, still runs other processes that are vulnerable to attack, such as X server ReVirt Shortcomings
Make checkpointing faster (full copy of vitual disk is time intensive) Enable real-time checkpointing of running VM Build high-level analysis tools Create new security services, such as automatic rollback and recovery. Future Work