260 likes | 626 Views
Helios: Heterogeneous Multiprocessing with Satellite Kernels. 김 세 욱 ksu1024@gmail.com. Heterogeneous Multiprocessing. Homogeneous vs. Heterogeneous Increasement design of multicore processor in order to improve system throughput . Homogeneous : the same characteristics.
E N D
Helios: Heterogeneous Multiprocessing with Satellite Kernels 김 세 욱 ksu1024@gmail.com
Heterogeneous Multiprocessing. • Homogeneous vs. Heterogeneous • Increasement design of multicore processor in order to improve system throughput. • Homogeneous : the same characteristics. • Heterogeneous : different characteristics. • Helios. • Operating system for heterogeneous platforms. • To simplify the task of writing, deploying, and tuning applications. • Satellite kernels.
Motivation(1/2). • Hardware was homogeneous. • equivalent functionality, instruction throughput, cache-coherence.
Motivation(2/2). • Hardware is heterogeneous. • “islands of computation” • Problem. • Programmable devices ignored by operating systems. • Device drivers become ever more complicated. • Programming models are fragmented. • Standard OS abstractions are missing.
Helios. • An OS designed to simplify the task of writing, deploying, tuning applications for Heterogeneous platforms. • By modifying the “Singularity” (Eurosys 2006) • Support satellite kernels, remote message passing, affinity.
Design Goals. • Export a single OS abstraction across different programming devices. • Minimize remote communication, hardware primitives, hardware resources. • Manage a very small number of private resources.(memory, CPU cycles) • Transparent IPC. • Namespace, remote message passing(RMP) • Simplify deployment and Tuning. • Constraints : moving processes, cache-coherence, preference a device. • Specify Affinity for other process. • Encapsulate Disparate Architectures. • Two-phase compilation strategy.
Implementation. • Based on Singularity OS. • Satellite kernels, Remote message passing, Affinity. • XScale Programming I/O card • 1.2 GHz ARM processor, Gig Ethernet, 256 MB memory. • Satellite kernel identical to x86. • NUMA Architecture • 2-socket, dual-core AMD machine • 2 GHz CPU, 1 GB RAM per domain. • Satellite kernel on each NUMA domain.
Implementation – Satellite Kernel. • Micro Kernel • Scheduler, Memory Management, Namespace manager • Efficiently manage local resources • Apps developed for single system call interface Current OS Helios
Implementation – Namespace. • Namespace. • Applications register in a namespace as services. • Namespace is used to connect IPC channels. • Coordinator kernel manages the namespace. • Satellite kernels register in namespace.
Implementation – Message Passing. • Message Passing Channel. • Local Message Passing (LMP) : fast, zero-copy message passing. • Remote Message Passing (RMP) : transparently marshals messages. • To manage copying data and signaling : “Shadow endpoints” • Unmodified apps work with multiple kernels
Implementation – 2-phase compilation. • All apps first compiled to CIL(Common Intermediate Language). • Byte-code of the .NET Platform • At install-time, apps compiled down to available ISAs. • Namespace, Affinity. • Merit. • Don’t consider supported platform’s ISA. • Can be represent variety version about one method.
Implementation – Affinity(1/2). • XML. • Automatically generated when a process is compiled into CIL. • Affinity easily edited by dev, admin, or users. • Positive Affinity. • Tight coupling. • Platform preference. • Negative Affinity. • Non-interference. • Isolation from other processes. • Avoiding resource contention. • Self-reference affinity. • Running multiple copies of itself on different devices or NUMA domains. • Scale-out performance.
Implementation – Affinity(2/2). • Priority based algorithm reduces candidate kernels by • Platform Affinities • Other positive Affinities • Negative Affinities • CPU Utilization • Attempt to balance simplicityand optimality.
Implementation – System design. Affinity value Processes Processes Immediate Language Immediate Language Local MessagePassing Remote MessagePassing Satellite Kernel Satellite Kernel NUMA I/O Device
Evaluation platform. • XScale Programmable I/O Card • X86 NUMA Architecture
Evaluation. • Offloading Singularity applications. • Helios applications offloaded with very little effort.
Evaluation. • Message passing microbenchmark. • LMP : not copy. • RMP Xscale : ADMA(asynchronous direct memory transfer) Controller. • RMP NUMA : memcpy.
Evaluation. • Offload benchmark. • Netstack Subsystem Offload. • Since x86 processor took fewer interrupts, • more efficiently and performance improved.
Evaluation. • Scheduling NUMA benchmark. • No satellite kernel : not NUMA-aware, single kernel, 16 threads. • Satellite kernel : NUMA-aware, two kernel, 8 threads per kernel. • 68% faster than single kernel.
Evaluation. • Mail server NUMA benchmark. • Satellite kernels improve performance 39%. • Satellite kernels ensure that processes always use local memorywhen accessing kernel code and data structures.
Conclusion. • Simplify application development, deployment, tuning. • 4 techniques to message heterogeneity. • Satellite kernels. • Remote message passing. • Affinity. • 2-phase compilation. • Offloading applications with zero code changes • Helios code release soon.
Backup – Zero copy. Not zero-copy Zero-copy