180 likes | 332 Views
On-the-Fly Kernel Updates for High-Performance Computing Clusters. Kristis Makris <kristis.makris@asu.edu> Arizona State University Kyung Dong Ryu <kryu@us.ibm.com> IBM T.J. Watson Research Center. Motivation. Updating the kernel in HP clusters requires downtime
E N D
On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris <kristis.makris@asu.edu> Arizona State University Kyung Dong Ryu <kryu@us.ibm.com> IBM T.J. Watson Research Center DynAMOS -- SMTPS '06
Motivation • Updating the kernel in HP clusters requires downtime • Revenue loss in pay-per-use, time-sharing clusters • Disruption of long-lived parallel tasks • Process migration may not be possible • Postponing updates has its price • Unpatched kernel security holes • Missed kernel specialization opportunities • Adaptive selection of kernel subsystem to use; Virtualization cannot help • Parallel computing needs • Safe, unobtrusive updates (no system restart) • Temporary, reversible specialization of some nodes • Portable updating system (i386 + PowerPC) DynAMOS -- SMTPS '06
Solution: Dynamic Kernel Updates • Approaches • Adaptable OS • Specially crafted, like K42, VINO, Synthetix • Require OS and application restructuring • Dynamic code instrumentation • Zero kernel source modification (KernInst, GILK) • Basic block code interposition • Currently limited • No procedure replacement • No autonomous kernel adaptability • No safe, complete subsystem update guarantees DynAMOS -- SMTPS '06
Dynamic Updates Classification • Updating changes in • Userspace requirements • Security fix breaks existing applications that rely on defect • Kernel external requirements • Function signature changes (API changes) • Kernel internal requirements • Global variables used by a function group (e.g. enlarge copy buffer used in pipefs) • Updating needs • State tracking • Enlarge copy buffer only for 2 processes • Must adaptively enlarge the buffer and use newer functions • State transfer • Copy data from old buffer to new DynAMOS -- SMTPS '06
Dynamic Update Types • No safe update point • Update read-only global variable (e.g maximum number of open files) • Add new variable used only by a single function • Safe update point • Update uid of an inode (guarded by a semaphore) • Add new variable used by function group (must update atomically) • Non-quiescent resources • Update kernel scheduler to use different policy. • Datatype updates • Update functions that use the old datatype to use the new datatype • Maintain shadow data structure that holds only new fields, and update only functions that use the new fields DynAMOS -- SMTPS '06
Prepare updates to be applied • Coordinate safe activation/removal DynAMOS System Architecture • Currently implemented for i386 uniprocessor Linux kernels 2.2-2.6 • Distribute updates to cluster nodes • Process updating requests from control station with framework DynAMOS -- SMTPS '06
Execution Flow Redirection (1) • Install trampoline in beginning of original function • Disable local processor interrupts • Flush I-cache • Use an indirect jump (jmp *) • Don’t modify page permissions • Divert execution to a redirection handler • Original function can no longer be directly executed DynAMOS -- SMTPS '06
Execution Flow Redirection (2) • Create separate redirection handler for each function • Customize from template • Clone and relocate original function image • Choose between active function versions with adaptation handler • Can execute different versions of functions in different process contexts DynAMOS -- SMTPS '06
Function Cloning Benefits • Unaltered stack when newer function is executed • No processor state saved on stack • Autonomous kernel determination of update timeliness • Using adaptation handler • Function-level instrumented applications • Basic blocks can be bypassed • Modifications developed in functions with original source language DynAMOS -- SMTPS '06
Function Relocation • Adjust relative branch instructions • Replace ret instructions with jumps back to redirection handler • Safely detect • Backward branches: Point to code overwritten by trampoline • Outbound branches: Jump to code outside function image DynAMOS -- SMTPS '06
Applying Security Patches • Openwall hardening changes for Linux 2.4.22 • Permission check when writing in named pipes • Updated open_namei function • No safe update point needed • Permission check when following a symbolic link • Updated open_namei, vfs_link functions • Had to update inline function do_follow_link, used by link_path_walk • No need to update functions atomically • Confirmed unauthorized access was denied DynAMOS -- SMTPS '06
Applying Unobtrusive Fine-grained Cycle Stealing • Linger-Longer system for Linux 2.2.19 • Introduces a guest priority • New scheduling policy • Updated schedule function in 4-node cluster • Confirmed guest processes were not consuming CPU time when host processes were active DynAMOS -- SMTPS '06
Applying Adaptive Memory Paging For Efficient Gang-Scheduling • Various adaptive memory paging policies for Linux 2.2.19 for 4-node cluster • Required modifications in kswapd, swap_out, rw_swap_page, swapin_readahead, filemap_nopage • kswapd is a kernel thread that never exits • Beginning of function is never called again • Thread sleeps by calling interruptible_sleep_on • Insert interruptible_sleep_on_v2 forcing kswapd to exit • Start kswapd_v2 • Confirmed job switching time was reduced DynAMOS -- SMTPS '06
Overhead • 29k footprint • < 1ns trampoline installation time • 20 ns redirection handler overhead • 2.3 secs update on 2Ghz P4 (adaptive paging) • 1-8% overhead (due to indirect jump) DynAMOS -- SMTPS '06
Related Work • Cluster Management Systems • Do not support dynamic kernel updates • K42 • Specially designed with hot-swappable capabilities • Requires quiescence for all updates • Hicks’ system • User-level software updates; requires recompilation • KernInst, GILK, ATOM, EEL • Do not facilitate adaptive execution • Do not replace complete subsystems DynAMOS -- SMTPS '06
On-going and Additional Work • Ensure safe update reversal • Confirm quiescence in stack and program counter • Update datatypes • Maintain shadow data structure of new fields • Apply EPCKPT kernel-assisted checkpointing • Adaptively enlarge pipefs buffer • Apply Superpages support • Apply Scalable TCP for highspeed WANs • Automatically produce updates given a patch file • Apply MOSIX • Upgrade Linux kernel DynAMOS -- SMTPS '06
Conclusion • Dynamic Kernel Updates • Dynamic code instrumentation • Commodity operating system • Function cloning for adaptive execution • Multiple function versions can run concurrently • Safe updates of non-quiescent subsystems • Scheduler, kernel threads • Demonstrated updates • Adaptive memory paging for efficient gang-scheduling • Unobtrusive fine-grain cycle stealing • Public security fixes • Small memory footprint, 1-8% overhead DynAMOS -- SMTPS '06
Questions ? DynAMOS -- SMTPS '06