400 likes | 552 Views
Enabling Interactive Multi-Experiment Computational Studies through a Permeable Runtime System-Application Interface. Ph.D. Thesis Proposal Siu Yau Jun 2006. Outline. Background & Motivation Computational Studies, Related Work Computational Systems Thesis Statement, Observations
E N D
Enabling Interactive Multi-Experiment Computational Studies through a Permeable Runtime System-Application Interface Ph.D. Thesis Proposal Siu Yau Jun 2006
Outline • Background & Motivation • Computational Studies, Related Work • Computational Systems • Thesis Statement, Observations • Experimental System (SimX) • Components & Preliminary results • Further Work • Project Timeline
Background & Motivation • Computer Simulation has become an integral part of the scientific method • Wide-spread use of Computational Studies in science and engineering • Much work done to speed up individual simulations • But Computational Studies can involve 100s, 1000s, or 10000s of simulations. . .
Computational Studies • Computational Studies • Simulation code, run multiple times • Parameter Space: possible inputs • Observation Space: measured metrics • Objectives • Identify points on Parameter Space meets objectives in the Observation Space • Goal: Interactive computational studies
Bridge Design • Simulation: Thin plate FEM bridge deformation • Parameter: Columns placement • Observation: Construction cost, Max deflection • Objectives: Pareto-optimal points: inferior to no other points in both observation metrics
Defibrillator Design • Simulation: Torso conductivity FEM • Parameter: Electrode placement, shock strength • Observation: Damage, Uniformity, Effectiveness • Objectives: Pareto-optimal points
Response Graph • Simulation: Transient analysis on 2D frame with time-periodic Boundary Condition • Parameter: Frequency • Observation: Amplitude • Objective: Frequency-Response graph
Related Work • Parameter Sweep schedulers • Condor: Distributed batch system • Globus: Toolkit deployed on grid resources for automatic resource discovery and workflow scheduling • Virtual Instrument: Interactively steerable parameter sweep application (interaction limited to parameter space selection) • Nimrod/O: Parametised Simulations on distributed computers with guided search
Related Work • Computational Steering Infrastructures • Falcon: On-line monitoring and steering of large-scale parallel programs • CUMULVS: Infrastructure for steering, monitoring, and checkpointing • SCIRun: Interactive computational steering environment using dataflow model • CSE: Steering and monitoring of computational processes on remote computers
Outline • Background & Motivation • Computational Studies, Related Work • Computational Systems • Thesis Statement, Observations • Experimental System (SimX) • Components & Preliminary results • Further Work • Project Timeline
Computational Systems • Treating each individual experiment as “black box” • Individual experiment runs treated separately (as in parameter sweeps) • Interactivity limited to individual experiments (as in steering infrastructures) • Thesis question: Without the “black box” restriction, can one steer 100s or 1000s of experiments simultaneously? Notes:
Thesis Statement • By exposing application level domain knowledge to the runtime system through a more permeable application-system interface, one can bring multi-experiment computational studies to interactive speed and enable steering of entire computational studies. Notes: This is really long… via application-specific prioritization of resources, quick adaptation of changing objectives, and flexible tradeoff of result accuracy and precision versus resource needs.
Strategies • Reducing runtime of experiments • Checkpointing: use the results of previously-run experiment(s) to jump-start a current one • Precision tradeoff: less precise experiments in return of higher throughput • Reducing runtime of group of experiments • Active Sampling: issuing less experiments • Experiment Scheduling: improve resource utilisation
Checkpointing • E.g. of checkpoint reuse: use the previous result as first guess to iterative method • Reduces runtime of individual experiments B Support 1 A C Support 2 E.g., Use result of a near-by parameter point as initial guess to an iterative solver For a given experiment, which checkpoint(s) to use, if any? Consider overhead Vs potential gains
Precision Tradeoff • Use lower resolution mesh • Use higher residual tolerance in iterative solvers • Reduces runtime of individual experiments • When is it trade-off permissible? • The user only needs a fuzzy result • The system decides (e.g., a point far from Pareto boundary)
Active Sampling • Running only a subset of experiments in the parameter space • Reduces number of experiments needed • Which strategy depends on the study, e.g.: • Sweep, Active, Guided search for Pareto Frontier • Graph plotting for Frequency response
Experiment Scheduling • Effective scheduling depends on accurate time-to-completion estimates • Use dynamically collected data to improve time-to-completion estimates, e.g., no. of iterations needed • Incorporate improved estimates to generate experiments’ execution schedule
Outline • Background & Motivation • Computational Studies, Related Work • Computational Systems • Thesis Statement, Observations • Experimental System (SimX) • Components & Preliminary results • Further Work • Project Timeline
Experimental System • Parallel System software for Interactive Multi-Experiment Computation Studies (SIMECS, or SimX) • Performs the Bridge study, Defibrillator study, and the Frequency-response plotting study • To evaluate the checkpointing, sampling, and resource allocation techniques
SimX Architecture Prototype implemented and presented in IPDPS 06 Designed for small-to-medium sized (< 1000) processor clusters One front-end, multiple “worker” processes Each worker process could perform any of the tasks in the system: running shared object layer, running simulation, allocate resource, etc.
SimX: Bridge Experiment • 4 stages • Each experiment requires O(100ms) • 24KB per checkpoint Stage 1: 1735 sims Stage 2: 4950 sims Stage 3: 18632 sims Stage 4: 75351 sims
SimX: Preliminary Results • Experiments on bridge study • Partitioned Object Space Server: Scales to 128 workers • Active Sampler reduces # of experiments to 1727, 2584, 4243, 4526 (from 1735, 4950, 18632, 75351) • Using checkpoint: shows 10x improvement in run time, each experiment reduced to O(10ms)
SimX: Preliminary Results Runtime of Bridge study on SimX Time-to-level = Wall clock time required by this configuration to refine the pareto frontier 4 times. All taken using 128 processors runs.
Outline • Background & Motivation • Computational Studies, Related Work • Computational Systems • Thesis Statement, Observations • Experimental System (SimX) • Components & Preliminary results • Further Work • Project Timeline
SimX Architecture Designed for small-to-medium sized (< 1000) processor clusters One front-end, multiple “worker” processes Each worker process could perform any of the tasks in the system: running shared object layer, running simulation, allocate resource, etc.
Active Sampler • Explore different types of samplers for different applications • 4 types of samplers: • Grid sampler, Active sampler, Guided Search, Graph Plotting • Thesis Goal: Identify strengths and limitations of each active sampler type
Sweep Sampling • Issues experiment on progressively finer grid on the Parameter Space 2nd Refinement First Refinement Initial Grid
Active Sampling • Only refines on Pareto Frontier 2nd Refinement First Refinement Initial Grid 3rd level results 1st level results 2nd level results
Guided Search • Start from random points, follow the better-performing neighbors
Graph Plotting • Uniform sampling first, then fill in details
SimX Architecture Designed for small-to-medium sized (< 1000) processor clusters One front-end, multiple “worker” processes Each worker process could perform any of the tasks in the system: running shared object layer, running simulation, allocate resource, etc.
Resource Allocator • Maps jobs on task list on to processors • how many processors • which processors • which checkpoints to use • Resources considered: Network bandwidth and processor time • Thesis goal: Quantify benefits of application-level knowledge in resource scheduling Vs black-box approach
Resource Allocator • Application level knowledge used in: • Runtime estimation • dynamically collected data • performance model supplied by user • Combination of empirical and analytical • Network bandwidth estimation: • logp model, managed by shared object layer
Resource Allocator • Scheduling Heuristics • Greedy • Fair share • Locality • Current status: FIFO; always use one closest checkpoint
SimX Architecture Designed for small-to-medium sized (< 1000) processor clusters One front-end, multiple “worker” processes Each worker process could perform any of the tasks in the system: running shared object layer, running simulation, allocate resource, etc.
Shared Object Layer • Implementation options • Server/Client Vs Integrated • Single Server Vs Partitioned Server • Caching Vs no Caching • Client Caching Vs Cooperative Caching • Thesis goal: Investigate implementation options and how they affect the overall performance of computational studies
Outline • Background & Motivation • Computational Studies, Related Work • Computational Systems • Thesis Statement, Observations • Experimental System (SimX) • Components & Preliminary results • Further Work • Project Timeline
Project Timeline • End of summer 06 • Expand the application base of SimX system: • Port SimX as part of SCIRun components to run the Defibrillator application • Evaluate the performance of SimX on Defibrillator application • Add error control, and space-partitioned SISOL server as needed
Project Timeline • End of fall 06 • Evaluate SimX capability to handle multiple parallel simulations • Implement parallel bridge and defibrillator simulations • Multi-server SISOL implementation • End of spring 07 • Implement and evaluate infrastructure for performance prediction
Project Timeline • End of fall 07 • Evaluate sampler policies (Grid Vs Active Vs Guided search Vs graph plotter) on bridge, defibrillator, and graph plotting applications • Evaluate performance of resource allocation heuristics (FIFO Vs Greedy Vs Fairshare Vs Locality)