270 likes | 394 Views
Live Updating Operating Systems Using Virtualization. Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu Zang Fudan University Pen-Chung Yew University of Minnesota at Twin-Cities. Motivation. Operating Systems are far from perfect: Security holes, design flaws, bugs, new features ……
E N D
Live Updating Operating Systems Using Virtualization Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu Zang Fudan University Pen-Chung Yew University of Minnesota at Twin-Cities
Motivation • Operating Systems are far from perfect: • Security holes, design flaws, bugs, new features …… • Results: continuous patches and upgrades required • Difficulties in applying patches and upgrades • Disruptive: loss of availability • Irreversible: risk of system crash • Live Update feature is highly desirable, and very often, critical.
What COS misses? • Requirements to Live Update an OS: • Define an updatable unit • Difficult, COS is monolithic • Apply patch in a safe point • Some hot spots do not have a safe point • root file system, network modules • Consistency • Difficult for OS to update itself
What is LUCOS? • ”Any problem in computer science can be solved with another level of indirection.” • David Wheeler in Butler Lampson’s 1992 ACM Turing Award speech. • Live Updating Contemporary Operating Systems using virtualization • Use Virtual Machine Monitors (VMMs) to patch operating systems (e.g. Linux) • Avoid need for safe point, allow co-existence of the old version and the new version of data structures. • VMM maintains the coherence and tracks when to finish a live update.
What is LUCOS? • A practical live updating system • Apply a broaden range of real-life Linux patches on-the-fly • require nosafe points, retain OS-transparency. • Support patches for recovering tainted state (e.g. deadlock situation) • Allow rolling back committed patches • Require minimal update time(< 1ms) and incur negligible performance overhead (less than 1%)
Some Existing Efforts • Dynamic Software Update • Focus on live update to application software • LUCOS: live update to operating systems • K42 (Baumann et al., Usenix ‘05) • A new operating system to support live update • Tightly bound to object-oriented design techniques • A safe point is desirable • LUCOS: transparently supports existing OS (including non-object-oriented), requires no safe point
Two Types of Live Updates • Updates to onlycode: • Only code is modified. • Updates to code with data changes: • Including global, single-instance data, or multiple-instance data.
Termination of a Live Update • When all threads leave original functions • Stack inspection (Altekar, Usenix Security’05): • Maintain a list of threads executing in original functions • Remove threads that leave original functions • Terminate live update when the list is empty
Patches for Recovering Tainted State • Vision: • Some bugs could cause a tainted state: • Deadlock situation • Simple patching could not solve the problem • spinlock_t demo_lock = SPIN_LOCK_UNLOCKED; • void foo(void){...; • spin_lock(&demo_lock); • ... ; • if(condition){return;} • ...; • spin_unlock(&demo_lock); • } • Code 1. a buggy function with • a potential for deadlocks. • spinlock_t demo_lock = SPIN_LOCK_UNLOCKED; • void foo_patch(void){...; • spin_lock(&demo_lock); • ...; • if(condition){ • spin_unlock(&demo_lock); • return; • }...; • spin_unlock(&demo_lock); • } • code 2: a patch function to fix • the deadlock problem. void state_transfer(void){ if(spin_is_locked( &demo_lock)) spin_unlock(&demo_lock); } code 3: a callback function to recover from a deadlocked situation.
Patches for Recovering Tainted State • Solutions: • Allow callbacks in live update • Three types of callbacks in LUCOS: • function callbacks • thread callbacks • data callbacks • Example: use thread callbacks to resolve the deadlock situation
Patch Rollback • A special type of patches: • Use the original code and data to patch the committed ones • Change state with new data back to original data • Resource overhead: • Has to keep original code and data in memory
Experiments Setup • Implemented on Linux 2.6.10 running Xen-2.0.5. • Systems: • Fedora Core 2 distribution • 3.0GHz Pentium IV with 1GB RAM • Intel Pro 100/1000 Ethernet NIC in 100Mbs LAN • A single 250GB 7200 RPM SATA disk.
Workloads • SPEC INT 2000: • Measure the performance of CPU-intensive workloads • Linux build time: • Measure the overall time to built a Linux Kernel 2.6.10 with gcc-3.3.3. • Open Source Database Benchmark suite (OSDB): • Information Retrieval (IR) • Online Transaction Processing (OLTP)
Experience with Real-Life Patches • Five typical patches selected from Linux upgrades: • upgrade of Linux kernel from 2.6.10 to 2.6.11 • upgrade of backend block device drivers in Xen-Linux
Time to Apply and Rollback Live Updates Note: OSDB-IR/OLTP are running in background when the patches are applied and rollbacked.
Conclusions • Existing operating systems can be live updated • No safe point is required • Patches should recover tainted state • Rollback of a live update is supported • Time overhead to apply a live update is minimal • Performance overhead is negligible
Future Work • Avoid the performance overhead of virtualization • Integrate it with our self-virtualization system • Virtualize operating systems on demand
Questions? • Our contact information: • Parallel processing institute, Fudan University, China • Phone: +86-21-51355363 • Fax: +86-21-65646571
Patch File Format in LUCOS • Follows the format of Linux kernel modules, and adds • New declarations of data structures • *Callback functions • *Patch startup and patch cleanup functions • *State transfer
Fine-grained memory protection • Facilitating ECC memory (Qin et al., HPCA’05) • cache line granularity • Mondrian memory protection (Witchel et al., ASPLOS-X) • word level memory protection
Self-virtualization: architecture • OS can switch between the three modes on-the-fly quickly • Applications are completely unaware of the mode switch • Hosting mode is used to host other OS . • Migrating mode prepares the OS to self-migrate to other machine.