1 / 40

Ph.D. Thesis Proposal Siu Yau Jun 2006

Enabling Interactive Multi-Experiment Computational Studies through a Permeable Runtime System-Application Interface. Ph.D. Thesis Proposal Siu Yau Jun 2006. Outline. Background & Motivation Computational Studies, Related Work Computational Systems Thesis Statement, Observations

Download Presentation

Ph.D. Thesis Proposal Siu Yau Jun 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enabling Interactive Multi-Experiment Computational Studies through a Permeable Runtime System-Application Interface Ph.D. Thesis Proposal Siu Yau Jun 2006

  2. Outline • Background & Motivation • Computational Studies, Related Work • Computational Systems • Thesis Statement, Observations • Experimental System (SimX) • Components & Preliminary results • Further Work • Project Timeline

  3. Background & Motivation • Computer Simulation has become an integral part of the scientific method • Wide-spread use of Computational Studies in science and engineering • Much work done to speed up individual simulations • But Computational Studies can involve 100s, 1000s, or 10000s of simulations. . .

  4. Computational Studies • Computational Studies • Simulation code, run multiple times • Parameter Space: possible inputs • Observation Space: measured metrics • Objectives • Identify points on Parameter Space meets objectives in the Observation Space • Goal: Interactive computational studies

  5. Bridge Design • Simulation: Thin plate FEM bridge deformation • Parameter: Columns placement • Observation: Construction cost, Max deflection • Objectives: Pareto-optimal points: inferior to no other points in both observation metrics

  6. Defibrillator Design • Simulation: Torso conductivity FEM • Parameter: Electrode placement, shock strength • Observation: Damage, Uniformity, Effectiveness • Objectives: Pareto-optimal points

  7. Response Graph • Simulation: Transient analysis on 2D frame with time-periodic Boundary Condition • Parameter: Frequency • Observation: Amplitude • Objective: Frequency-Response graph

  8. Related Work • Parameter Sweep schedulers • Condor: Distributed batch system • Globus: Toolkit deployed on grid resources for automatic resource discovery and workflow scheduling • Virtual Instrument: Interactively steerable parameter sweep application (interaction limited to parameter space selection) • Nimrod/O: Parametised Simulations on distributed computers with guided search

  9. Related Work • Computational Steering Infrastructures • Falcon: On-line monitoring and steering of large-scale parallel programs • CUMULVS: Infrastructure for steering, monitoring, and checkpointing • SCIRun: Interactive computational steering environment using dataflow model • CSE: Steering and monitoring of computational processes on remote computers

  10. Outline • Background & Motivation • Computational Studies, Related Work • Computational Systems • Thesis Statement, Observations • Experimental System (SimX) • Components & Preliminary results • Further Work • Project Timeline

  11. Computational Systems • Treating each individual experiment as “black box” • Individual experiment runs treated separately (as in parameter sweeps) • Interactivity limited to individual experiments (as in steering infrastructures) • Thesis question: Without the “black box” restriction, can one steer 100s or 1000s of experiments simultaneously? Notes:

  12. Thesis Statement • By exposing application level domain knowledge to the runtime system through a more permeable application-system interface, one can bring multi-experiment computational studies to interactive speed and enable steering of entire computational studies. Notes: This is really long… via application-specific prioritization of resources, quick adaptation of changing objectives, and flexible tradeoff of result accuracy and precision versus resource needs.

  13. Strategies • Reducing runtime of experiments • Checkpointing: use the results of previously-run experiment(s) to jump-start a current one • Precision tradeoff: less precise experiments in return of higher throughput • Reducing runtime of group of experiments • Active Sampling: issuing less experiments • Experiment Scheduling: improve resource utilisation

  14. Checkpointing • E.g. of checkpoint reuse: use the previous result as first guess to iterative method • Reduces runtime of individual experiments B Support 1 A C Support 2 E.g., Use result of a near-by parameter point as initial guess to an iterative solver For a given experiment, which checkpoint(s) to use, if any? Consider overhead Vs potential gains

  15. Precision Tradeoff • Use lower resolution mesh • Use higher residual tolerance in iterative solvers • Reduces runtime of individual experiments • When is it trade-off permissible? • The user only needs a fuzzy result • The system decides (e.g., a point far from Pareto boundary)

  16. Active Sampling • Running only a subset of experiments in the parameter space • Reduces number of experiments needed • Which strategy depends on the study, e.g.: • Sweep, Active, Guided search for Pareto Frontier • Graph plotting for Frequency response

  17. Experiment Scheduling • Effective scheduling depends on accurate time-to-completion estimates • Use dynamically collected data to improve time-to-completion estimates, e.g., no. of iterations needed • Incorporate improved estimates to generate experiments’ execution schedule

  18. Outline • Background & Motivation • Computational Studies, Related Work • Computational Systems • Thesis Statement, Observations • Experimental System (SimX) • Components & Preliminary results • Further Work • Project Timeline

  19. Experimental System • Parallel System software for Interactive Multi-Experiment Computation Studies (SIMECS, or SimX) • Performs the Bridge study, Defibrillator study, and the Frequency-response plotting study • To evaluate the checkpointing, sampling, and resource allocation techniques

  20. SimX Architecture Prototype implemented and presented in IPDPS 06 Designed for small-to-medium sized (< 1000) processor clusters One front-end, multiple “worker” processes Each worker process could perform any of the tasks in the system: running shared object layer, running simulation, allocate resource, etc.

  21. SimX: Bridge Experiment • 4 stages • Each experiment requires O(100ms) • 24KB per checkpoint Stage 1: 1735 sims Stage 2: 4950 sims Stage 3: 18632 sims Stage 4: 75351 sims

  22. SimX: Preliminary Results • Experiments on bridge study • Partitioned Object Space Server: Scales to 128 workers • Active Sampler reduces # of experiments to 1727, 2584, 4243, 4526 (from 1735, 4950, 18632, 75351) • Using checkpoint: shows 10x improvement in run time, each experiment reduced to O(10ms)

  23. SimX: Preliminary Results Runtime of Bridge study on SimX Time-to-level = Wall clock time required by this configuration to refine the pareto frontier 4 times. All taken using 128 processors runs.

  24. Outline • Background & Motivation • Computational Studies, Related Work • Computational Systems • Thesis Statement, Observations • Experimental System (SimX) • Components & Preliminary results • Further Work • Project Timeline

  25. SimX Architecture Designed for small-to-medium sized (< 1000) processor clusters One front-end, multiple “worker” processes Each worker process could perform any of the tasks in the system: running shared object layer, running simulation, allocate resource, etc.

  26. Active Sampler • Explore different types of samplers for different applications • 4 types of samplers: • Grid sampler, Active sampler, Guided Search, Graph Plotting • Thesis Goal: Identify strengths and limitations of each active sampler type

  27. Sweep Sampling • Issues experiment on progressively finer grid on the Parameter Space 2nd Refinement First Refinement Initial Grid

  28. Active Sampling • Only refines on Pareto Frontier 2nd Refinement First Refinement Initial Grid 3rd level results 1st level results 2nd level results

  29. Guided Search • Start from random points, follow the better-performing neighbors

  30. Graph Plotting • Uniform sampling first, then fill in details

  31. SimX Architecture Designed for small-to-medium sized (< 1000) processor clusters One front-end, multiple “worker” processes Each worker process could perform any of the tasks in the system: running shared object layer, running simulation, allocate resource, etc.

  32. Resource Allocator • Maps jobs on task list on to processors • how many processors • which processors • which checkpoints to use • Resources considered: Network bandwidth and processor time • Thesis goal: Quantify benefits of application-level knowledge in resource scheduling Vs black-box approach

  33. Resource Allocator • Application level knowledge used in: • Runtime estimation • dynamically collected data • performance model supplied by user • Combination of empirical and analytical • Network bandwidth estimation: • logp model, managed by shared object layer

  34. Resource Allocator • Scheduling Heuristics • Greedy • Fair share • Locality • Current status: FIFO; always use one closest checkpoint

  35. SimX Architecture Designed for small-to-medium sized (< 1000) processor clusters One front-end, multiple “worker” processes Each worker process could perform any of the tasks in the system: running shared object layer, running simulation, allocate resource, etc.

  36. Shared Object Layer • Implementation options • Server/Client Vs Integrated • Single Server Vs Partitioned Server • Caching Vs no Caching • Client Caching Vs Cooperative Caching • Thesis goal: Investigate implementation options and how they affect the overall performance of computational studies

  37. Outline • Background & Motivation • Computational Studies, Related Work • Computational Systems • Thesis Statement, Observations • Experimental System (SimX) • Components & Preliminary results • Further Work • Project Timeline

  38. Project Timeline • End of summer 06 • Expand the application base of SimX system: • Port SimX as part of SCIRun components to run the Defibrillator application • Evaluate the performance of SimX on Defibrillator application • Add error control, and space-partitioned SISOL server as needed

  39. Project Timeline • End of fall 06 • Evaluate SimX capability to handle multiple parallel simulations • Implement parallel bridge and defibrillator simulations • Multi-server SISOL implementation • End of spring 07 • Implement and evaluate infrastructure for performance prediction

  40. Project Timeline • End of fall 07 • Evaluate sampler policies (Grid Vs Active Vs Guided search Vs graph plotter) on bridge, defibrillator, and graph plotting applications • Evaluate performance of resource allocation heuristics (FIFO Vs Greedy Vs Fairshare Vs Locality)

More Related