Dynamic Adaptivity in Support of Extreme Scale Pat Teller, UTEP

FastOS Dynamic Adaptivity in Support of Extreme Scale Pat Teller, UTEP FastOS PI Meeting

Outline • Collaborators • Overview • Progress • Plans FastOS PI Meeting

Partners University of Texas at El Paso Department of Computer Science Patricia J. Teller (pteller@cs.utep.edu) University of Wisconsin — Madison Computer Sciences Department Barton P. Miller (bart@cs.wisc.edu) International Business Machines, Inc. Linux Technologies Center Bill Buros (wmb@us.ibm.com) Lawrence Berkeley National Laboratory Leonid Oliker (LOliker@lbl.gov) new partner U.S. Department of Energy Office of Science Fred Johnson (fjohnson@er.doe.gov) FastOS PI Meeting

Teams • UTEP team (Pat Teller) • Rodrigo Romero, Ph.D. (Post-doc) • Seetharami Seelam, Ph.D. candidate in CS • Luis Ortiz, Ph.D. candidate in CS • Jayaraman Suresh, Master’s candidate in CS • Brenda Prieto, Master’s candidate in ECE • Nidia Pedregon, Undergraduate in CS • Alejandro Castaneda, Undergraduate in CS • U. Wisconsin-Madison team (Bart Miller) • Michael Brim, Ph.D. candidate • Igor Grobman, Ph.D. candidate FastOS PI Meeting

Outline • Collaborators • Overview • Goal • Challenges • Deliverables • Methodology • Progress & Direction FastOS PI Meeting

Goal:Enhanced Performance GeneralizedCustomized resource management FixedDynamically Adaptable OS/runtime services FastOS PI Meeting

Challenges Determining • What to adapt • When to adapt • How to adapt • How to measure effects of adaptation FastOS PI Meeting

Deliverables • Mechanisms to dynamically sense, • analyze, and adjust • performance metrics • fluctuating workload situations • overall system environment conditions FastOS PI Meeting

Deliverables • Linux prototypes and experiments • that demonstrate dynamic self-tuning / • provisioning in HPC environments FastOS PI Meeting

Deliverables • Methodology for general-purpose OS • adaptation FastOS PI Meeting

Methodology characterize workloadresource usage patterns identify adaptationtargets off line potentially profitable adaptation targets determine/redetermine feasible adaptation ranges off line/ run time define/adapt metrics/heuristics to trigger adaptation generate/adapt monitoring, triggering and adaptation code, and attach it to OS monitor application execution, assessing performance (gain) and triggering adaptation as necessary KernInst FastOS PI Meeting

IBM pSeries eServer 690 InstrumentationTool Client KernInst Daemon KernInst Device KernInst API Linux Kernel KernInst dynamic monitoring, instrumentation, and adaptation of the kernel FastOS PI Meeting

Outline • Collaborators • Overview • Progress & Direction • Tools • Infrastructure • Collaboration • Research FastOS PI Meeting

Tools Progress • KernInst – • POWER4 port for Linux 2.4 and IA32 Linux 2.6 • Modifications/Enhancements for DAiSES research • POWER4 port for Linux 2.6 underway • IOstat (coarse statistics) – POWER4 Linux 2.6 • Investigating complementary use of oprofile and kprobes – POWER4 Linux 2.6 FastOS PI Meeting

KernInst • Intel IA32 port for Linux 2.4 and 2.6 for Pentium 3 and Pentium 4 processors • IBM POWER4 port for Linux 2.4 • Supports stand-alone kernels and kernels that run under the Hypervisor virtual machine layer • Hypervisor layer not transparent and requires explicit support • Little public documentation on this layer available FastOS PI Meeting

Infrastructure Progress IBM SUR Grants, UTEP Star Award, UTEP PUF Funds • Development of Experimental Platforms at UTEP • IBM eServer pSeries 690 (16 processors, 32GB, 2TB) • Linux 2.4/2.6 partition for KernInst development • Linux 2.4 partition for DAiSES research • Linux 2.6 partition for DAiSES research • DS4300 RAID for DAiSES I/O-related research (1TB) • Xeon workstations – Linux 2.4 and 2.6 • IBM eServer p590 (24 processors, 64GB, 2TB) • IBM eServer p550 (4 processors, ) • Establishment of DAiSES Lab at UTEP FastOS PI Meeting

Build/Strengthen Collaborations - 1 • April 14-16, 2004 – Seetharami Seelam, UTEP, attended Paradyn/Condor week • October 24, 2004 – Barney MacCabe, UNM, visited UTEP to meet with the DAiSES team and give a talk re: his team’s FASTOS research • November 10, 2004 – Pat Teller, UTEP, participated in the FASTOS Birds-of-a-Feather meeting at SC2004– this was the first public presentation of the DAiSES project • November 2004 – Rodrigo Romero, Seetharami Seelam, and Pat Teller, UTEP, and Michael Brim, Igor Grobman, and Bart Miller, UW-Madison, promoted the DAiSES project at two SC2004 research exhibits, one shared by UTEP, UNM, New Mexico State University, and New Mexico Institute of Technology, and another of UW-Madison FastOS PI Meeting

Build/Strengthen Collaborations - 2 • February 25, 2005 – Luis Ortiz, Rodrigo Romero, Seetharami Seelam, and Pat Teller, UTEP, attended a half-day meeting at IBM-Austin with approximately nine members of the Linux Technologies team • March 3, 2005 –Rodrigo Romero, Seetharami Seelam, and Pat Teller, UTEP, attended an all-day meeting at UNM with Barney Maccabe, Patrick Bridges, Kurt Ferreira, an Edgar Leon, UNM/CS, Orran Krieger, IBM, Ron Brightwell and Rolf Riesen, SNL, and Rod Oldehoeft, LANL/ACL • March 20-24, 2005 – Igor Grobman, UW-Madison, visited UTEP and led a workshop re: the use of KernInst and Kperfmon to implement adaptations, in particular, in the process scheduler and I/O scheduler; Bill Buros, IBM-Austin, senior member of Linux Technologies team, attended for three days – resulted in modifications/enhancements to KernInst FastOS PI Meeting

Build/Strengthen Collaborations - 3 • May 4, 2005 – after IBM Petaflops Tools Strategy Workshop at IBM TJ Watson Research Center, Bart Miller, UW-Madison, and Seetharami Seelam (awarded $500 to attend the workshop) and Pat Teller, UTEP, met with Evelyn Duesterwald and Robert Wisniewski re: possible collaborations • Weekly telecons with IBM-Austin team • Telecons and Access Grid meetings with UW-Madison team • Shared Enotebook to be launched shortly • Shared data repository with search tool to be launched shortly FastOS PI Meeting

Current Research Thrusts • Dynamic Code Optimization • Low-hanging Fruit (i.e., opportunitistic targets of adaptation) • Identification of Adaptation Targets via Self-propelled Instrumentation • Other Directions FastOS PI Meeting

Dynamic Code Optimization • Investigation of dynamic code optimization strategies • [Tamches and Miller] used dynamic reorganization of basic block layout in parts of SPARC Solaris kernel to improve performance via I-cache miss reduction • Discussion in research community asks if such optimizations are • workload dependent and need to be done dynamically or • mostly independent of workload and can be done statically • Goal: to provide conclusive evidence either way FastOS PI Meeting

Low-hanging Fruit I/O scheduling – extend work of IBM-Austin I/O scheduler parameter selection via neural networks – extend work of IBM-Austin that uses genetic algorithms process scheduling – extend published observations and address daemon control virtual memory management – extend dissertation work page size – extend work of IBM TJ Watson Research Ctr. FastOS PI Meeting

I/O Scheduling (in progress) • What to adapt: I/O scheduler • Dynamic selection of “appropriate” I/O scheduler for observed “system state” • When to adapt • Change in “system state” (now identified via IOstat) • Below threshold related to number of queued I/O requests • How to adapt • Linux 2.6 provides capability • I/O schedulers characterized w.r.t. “best” performance for different “system states” [Pratt and Heger] • How to measure effects of adaptation • Execution time and throughput MB/s (for now) FastOS PI Meeting

I/O Scheduling By-products - 1 Enhancements to Linux I/O Scheduling,” to appear in Proceedings of the Linux Symposium, Ottawa, Canada, July 2005 (S. Seelam, R. Romero, P. Teller, and W. Buros) • Reviews previous work of IBM-Austin characterizes workloads best served by each of the four Linux 2.6 I/O schedulers (can be selected at boottime or runtime) • Presents cases where the Anticipatory Scheduler (AS) results in process starvation FastOS PI Meeting

I/O Scheduling By-products - 1 cont’d. • Presents and demonstrates performance of UTEP CAS, Cooperative Anticipatory Scheduler • extends anticipation to “cooperative” processes that collectively issue synchronous requests to a close set of disk blocks • compares performance to current four schedulers • shows order of magnitude performance improvement in cases where AS performs poorly FastOS PI Meeting

I/O Scheduling By-products - 2 In progress: demonstration of heuristically-guided dynamic selection of Linux 2.6 I/O schedulers (target: FAST) • First step towards making I/O scheduling fully autonomic (Ph.D. dissertation topic: Seetharami Seelam) • Selection based on observed system behavior, i.e., system (workload) I/O behavior, metric, in particular, I/O request size FastOS PI Meeting

I/O Scheduling By-products - 2 cont’d. • Using a priori measurements of disk throughput under the various schedulers and request sizes to generate a function that at runtime, given the current average request size, returns the scheduler that gives the best measured throughput for the specified disk • Identying adaptation interval, i.e., when it is not too expensive to switch schedulers–based on number of queued I/O requests • Future work: UW-Madison team will use KernInst to effect the adaptation FastOS PI Meeting

Overhead of Draining I/O Queue FastOS PI Meeting

Microbenchmark Synchronous Reads FastOS PI Meeting

Microbenchmark Synchronous Reads ZOOMED 1 FastOS PI Meeting

Microbenchmark Synchronous Reads ZOOMED 2 FastOS PI Meeting

MicrobenchmarkWrites FastOS PI Meeting

Linux Compilation Disk Accesses FastOS PI Meeting

Dynamic Adaptation • Uses IOstat information, which is after the fact • Read/Write prediction uses two-bit saturating counter • For reads: size 1K - 32K use Deadline size > 32K use AS • For writes: size 1K - 64K use AS size > 64K use Noop FastOS PI Meeting

Linux Compilation Read/Write Access FastOS PI Meeting

Performance comparison of Linux Compilation (gmake –j 16; xeon 2.78GHz; source on RAID-0 with 4 IDE drives) preliminary results – inconclusive FastOS PI Meeting

Lessons Learned • 80 scheduler switches • Switching is not “costing our life” • Switching has to be on a coarser granularity • “Smaller” picture (focus: requests) is captured • “Bigger” picture (focus: workloads) needs to be captured • Prediction does not have to be per-request • Example desired prediction: database workload followed by streaming reads – should select deadline/cfq followed by AS FastOS PI Meeting

Applications/Benchmarks used for I/O Scheduling Research - 1 • Flexible File System Benchmark – FFSB bench • used to generate profile-driven, I/O workload with a characteristic I/O access pattern of a given type of server, e.g., web, file, email servers, and MetaData server. • Microbenchmarks • Streaming writes and chunk reads • Streaming reads and chunk reads • Chunk reads FastOS PI Meeting

Applications/Benchmarks used for I/OScheduling Research - 2 • MADbench: I/O intensive (Borril, et al.) • based on MADCAP, an application for estimating the power spectrum of cosmic microwave background radiation • retains computational intensity, operational complexity, and system requirements of MADCAP and implements its three main processing steps • builds signal correlation derivative matrices and it requires neither read operations nor communication • builds a subset of a data correlation matrix and then inverts it; does not require writes • reads a subset of the signal correlation matrices built in the first step and performs matrix multiplication against the inverted matrix obtained in step two; does not require writes FastOS PI Meeting

Process Scheduling (stalled) Several references indicate that the Linux 2.4 scheduling policy has an adverse effect on non-real-time, non-interactive applications—a description that fits the high-performance applications FastOS PI Meeting

Process Scheduling • What to adapt: process scheduler • Dynamic selection of “appropriate” process scheduler for observed “system state” • When to adapt • Change in “system state” • Below threshold related to time spent in scheduler or length of “runnable” queue • How to adapt • Change process type to real-time, i.e., scheduler to round-robin • A priori knowledge of process IDs • How to measure effects of adaptation • Execution time FastOS PI Meeting

Adaptation: timeshared to real time (round-robin) scheduling with fixed quantum and priority per process for the lifetime of the application Too much time in scheduler Adaptation to round-robin FastOS PI Meeting

FastOS PI Meeting

Learning-based, Heuristic-driven Dynamic Adaptation (longer term) • I/O scheduler parameter selection • IBM-Austin: genetic algorithms • UTEP: investigate potentially lower cost hybrid techniques that combine the use of neural networks, genetic algorithms, and fuzzy logic (possible dissertation topic: Luis Ortiz, UTEP) • Master’s thesis: neural network approach to selecting parameters of the Anticipatory Scheduler of Linux 2.6 [Moilanen] FastOS PI Meeting

Near-term Research Directions – 1 • Adaptation target investigation via self-propelled instrumentation to obtain function-level traces from applications and the kernel [Mirgorodskiy and B. Miller] • Possible applications: • LBMHD (plasma physics), PARATEC (material science), CACTUS (astrophysics), and GTC (magnetic fusion), which are able to fully utilize the performance of machines comparable to the Earth Simulator and Cray X1 [Oliker, et al.] • Sweep3D (using for daemon control investigation) • SPECjAppServer2004 (Websphere, DB2) FastOS PI Meeting

Near-term Research Directions – 2 • Study of applications to determine profitability of • Dynamically adapting page size [Cascaval, et al.] • Daemon control (started) [D. Bailey and Hoisie, et al.] • Virtual memory management (just starting) • Dynamically adapting time quantum to control, e.g., process cache contention • k-factor analysis to develop mathematical models to guide I/O parameter set selection FastOS PI Meeting

References – 1 • Bailey, D., Private Communications, 2005. • Borril, J., J. Carter, L. Oliker, D. Skinner, and R. Biswas, “Integrated Performance Monitoring of a Cosmology Application on Leading HEC Platforms,” Proceedings of the 2005 International Conference on Parallel Processing (ICPP-05), June 2005. • Cascaval, C., E. Duesterwald, P. Sweeney, and R. Wisniewski, “Multiple Page Size Modeling and Optimization,” Proceedings of the Fourteenth International Conference on Parallel Architectures and Compilation Techniques (PACT-2005), September 2005. • http://www.sourceforge.net/projects/ffsb FastOS PI Meeting

Dynamic Adaptivity in Support of Extreme Scale Pat Teller, UTEP

Dynamic Adaptivity in Support of Extreme Scale Pat Teller, UTEP

Presentation Transcript

Extreme-Scale Software Overview

Simulation at Extreme Scale

Large-scale Recommendations in a Dynamic Marketplace

Some thoughts on Extreme Scale Computing

Fortune Teller

Architectures for Extreme-Scale Computing

Extreme scale parallel and distributed systems

UTEP Recycling

Perspective on Extreme Scale Computing in China

UTEP Presentation Context

Architecture-based Adaptivity

More on Adaptivity in Grids

Overview of Extreme-Scale Software Research in China

Adaptivity in continuous query systems

Adaptivity in continuous query systems

Support for Adaptivity in ARMCI Using Migratable Objects

Overview of Extreme-Scale Software Research in China

Dynamic Adaptivity in Support of Extreme Scale Pat Teller, UTEP