1 / 18

On-the-Fly Kernel Updates for High-Performance Computing Clusters

On-the-Fly Kernel Updates for High-Performance Computing Clusters. Kristis Makris <kristis.makris@asu.edu> Arizona State University Kyung Dong Ryu <kryu@us.ibm.com> IBM T.J. Watson Research Center. Motivation. Updating the kernel in HP clusters requires downtime

hei
Download Presentation

On-the-Fly Kernel Updates for High-Performance Computing Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On-the-Fly Kernel Updates for High-Performance Computing Clusters Kristis Makris <kristis.makris@asu.edu> Arizona State University Kyung Dong Ryu <kryu@us.ibm.com> IBM T.J. Watson Research Center DynAMOS -- SMTPS '06

  2. Motivation • Updating the kernel in HP clusters requires downtime • Revenue loss in pay-per-use, time-sharing clusters • Disruption of long-lived parallel tasks • Process migration may not be possible • Postponing updates has its price • Unpatched kernel security holes • Missed kernel specialization opportunities • Adaptive selection of kernel subsystem to use; Virtualization cannot help • Parallel computing needs • Safe, unobtrusive updates (no system restart) • Temporary, reversible specialization of some nodes • Portable updating system (i386 + PowerPC) DynAMOS -- SMTPS '06

  3. Solution: Dynamic Kernel Updates • Approaches • Adaptable OS • Specially crafted, like K42, VINO, Synthetix • Require OS and application restructuring • Dynamic code instrumentation • Zero kernel source modification (KernInst, GILK) • Basic block code interposition • Currently limited • No procedure replacement • No autonomous kernel adaptability • No safe, complete subsystem update guarantees DynAMOS -- SMTPS '06

  4. Dynamic Updates Classification • Updating changes in • Userspace requirements • Security fix breaks existing applications that rely on defect • Kernel external requirements • Function signature changes (API changes) • Kernel internal requirements • Global variables used by a function group (e.g. enlarge copy buffer used in pipefs) • Updating needs • State tracking • Enlarge copy buffer only for 2 processes • Must adaptively enlarge the buffer and use newer functions • State transfer • Copy data from old buffer to new DynAMOS -- SMTPS '06

  5. Dynamic Update Types • No safe update point • Update read-only global variable (e.g maximum number of open files) • Add new variable used only by a single function • Safe update point • Update uid of an inode (guarded by a semaphore) • Add new variable used by function group (must update atomically) • Non-quiescent resources • Update kernel scheduler to use different policy. • Datatype updates • Update functions that use the old datatype to use the new datatype • Maintain shadow data structure that holds only new fields, and update only functions that use the new fields DynAMOS -- SMTPS '06

  6. Prepare updates to be applied • Coordinate safe activation/removal DynAMOS System Architecture • Currently implemented for i386 uniprocessor Linux kernels 2.2-2.6 • Distribute updates to cluster nodes • Process updating requests from control station with framework DynAMOS -- SMTPS '06

  7. Execution Flow Redirection (1) • Install trampoline in beginning of original function • Disable local processor interrupts • Flush I-cache • Use an indirect jump (jmp *) • Don’t modify page permissions • Divert execution to a redirection handler • Original function can no longer be directly executed DynAMOS -- SMTPS '06

  8. Execution Flow Redirection (2) • Create separate redirection handler for each function • Customize from template • Clone and relocate original function image • Choose between active function versions with adaptation handler • Can execute different versions of functions in different process contexts DynAMOS -- SMTPS '06

  9. Function Cloning Benefits • Unaltered stack when newer function is executed • No processor state saved on stack • Autonomous kernel determination of update timeliness • Using adaptation handler • Function-level instrumented applications • Basic blocks can be bypassed • Modifications developed in functions with original source language DynAMOS -- SMTPS '06

  10. Function Relocation • Adjust relative branch instructions • Replace ret instructions with jumps back to redirection handler • Safely detect • Backward branches: Point to code overwritten by trampoline • Outbound branches: Jump to code outside function image DynAMOS -- SMTPS '06

  11. Applying Security Patches • Openwall hardening changes for Linux 2.4.22 • Permission check when writing in named pipes • Updated open_namei function • No safe update point needed • Permission check when following a symbolic link • Updated open_namei, vfs_link functions • Had to update inline function do_follow_link, used by link_path_walk • No need to update functions atomically • Confirmed unauthorized access was denied DynAMOS -- SMTPS '06

  12. Applying Unobtrusive Fine-grained Cycle Stealing • Linger-Longer system for Linux 2.2.19 • Introduces a guest priority • New scheduling policy • Updated schedule function in 4-node cluster • Confirmed guest processes were not consuming CPU time when host processes were active DynAMOS -- SMTPS '06

  13. Applying Adaptive Memory Paging For Efficient Gang-Scheduling • Various adaptive memory paging policies for Linux 2.2.19 for 4-node cluster • Required modifications in kswapd, swap_out, rw_swap_page, swapin_readahead, filemap_nopage • kswapd is a kernel thread that never exits • Beginning of function is never called again • Thread sleeps by calling interruptible_sleep_on • Insert interruptible_sleep_on_v2 forcing kswapd to exit • Start kswapd_v2 • Confirmed job switching time was reduced DynAMOS -- SMTPS '06

  14. Overhead • 29k footprint • < 1ns trampoline installation time • 20 ns redirection handler overhead • 2.3 secs update on 2Ghz P4 (adaptive paging) • 1-8% overhead (due to indirect jump) DynAMOS -- SMTPS '06

  15. Related Work • Cluster Management Systems • Do not support dynamic kernel updates • K42 • Specially designed with hot-swappable capabilities • Requires quiescence for all updates • Hicks’ system • User-level software updates; requires recompilation • KernInst, GILK, ATOM, EEL • Do not facilitate adaptive execution • Do not replace complete subsystems DynAMOS -- SMTPS '06

  16. On-going and Additional Work • Ensure safe update reversal • Confirm quiescence in stack and program counter • Update datatypes • Maintain shadow data structure of new fields • Apply EPCKPT kernel-assisted checkpointing • Adaptively enlarge pipefs buffer • Apply Superpages support • Apply Scalable TCP for highspeed WANs • Automatically produce updates given a patch file • Apply MOSIX • Upgrade Linux kernel DynAMOS -- SMTPS '06

  17. Conclusion • Dynamic Kernel Updates • Dynamic code instrumentation • Commodity operating system • Function cloning for adaptive execution • Multiple function versions can run concurrently • Safe updates of non-quiescent subsystems • Scheduler, kernel threads • Demonstrated updates • Adaptive memory paging for efficient gang-scheduling • Unobtrusive fine-grain cycle stealing • Public security fixes • Small memory footprint, 1-8% overhead DynAMOS -- SMTPS '06

  18. Questions ? DynAMOS -- SMTPS '06

More Related