120 likes | 243 Views
Young Suk Moon. Dynamic Fault Tolerant Grid Workflow in the Water Threat Management Project. Urban Water Distribution Systems. Supplying water Pipe networks Redundant flow paths Millions of pipes. http://www.crwr.utexas.edu/gis/gishydro03/Classroom/trmproj/Garcia-Fresca/UrbanRecharge.htm.
E N D
Young Suk Moon Dynamic Fault Tolerant Grid Workflowin the Water Threat Management Project
Urban Water Distribution Systems • Supplying water • Pipe networks • Redundant flow paths • Millions of pipes • http://www.crwr.utexas.edu/gis/gishydro03/Classroom/trmproj/Garcia-Fresca/UrbanRecharge.htm
Water Threat Management Project • Analyzing contaminations of water in WDSs • EPANET simulation (developed at Environmental Protection Agency) find the optimal solution find the contaminant source Simulation Engine (MPI) Sensor Data Optimization Engine Middle Ware EPANET EPANET EPANET EPANET Grid Resources • From my presentation slides in the project/thesis seminar class
Project Requirements for WTM • Changing the MPI system to a loosely coupled system • Parallel execution of EPANET • Large number of evaluations • Integrate fault tolerance
Fault Tolerant Strategies • Replication • Run the same job on multiple machines concurrently • Fast, less reliable, needs enough resources • Checkpoint-restart • Store current computing states periodically • Slow (checkpoint overhead), more reliable
Fault Tolerant Strategies in the project • Replication • A job (replica) is submitted to multiple nodes to run the same jobs concurrently • Multiple queuing • A set of jobs is submitted to multiple nodes with different (job) orders to run different jobs concurrently j1 j1 j1 j2 j1 j3 j2 j1 j3 Machines
System Design • Figure from my pre-proposal
Model Description • J = {j1, j2, j3, ... ,jn} • Q = {q1, q2, q3, ... ,qm} • R = {r1, r2, r3, ... ,rl}
Model Description • Mapping jobs to queues • ft : J → Q • ft (j) = { j ∈ J | ∀q, ∃j, ft (j) = q ∈ Q} • Mapping queues to available resources • g t : Q → P (R) • g t (q) = {q ∈ Q | ∀q, ∃Ra , g t (q) = Ra ∈ P (R)} • Mapping a queue to a resource • h t : (Q, Ra ) → Q × Ra • h t (q, r) = {q ∈ Q, r ∈ Rat | (q, r) ∈ Q × Rat }
An example of Dynamic Fault Tolerance Selection Algorithm na: number of nodes that are available nr : number of jobs that can be run in parallel while (there is any job remaining) na← check resource availability nr← check job parallelism if nr < na < 2nr then do partial replication and partial queuing else if na≥ 2nr then do full replication else do queuing end while
Resource Selection • A number of ways to choose resources • Minimization functions related to • Performance of resources • Temperature of resources • laplace’s equation
References • G. von Laszewski, K. Mahinthakumar, R. Ranjithan, D. Brill, J. Uber, K. Harrison, S. Sreepathi, and E. Zechman, “An Adaptive Cyberinfrastructure for Threat Management in Urban Water Distribution Systems,” in Proceedings of ICCS 2006, vol. 3993, 2006, pp. 401–. • S. Sreepathi, “Cyberinfrastructure for Contamination Source Characterization in Water Distribution Systems,” Master’s thesis, North Carolina State University, 2006 • L. Ramakrishnam and D. A. Reed, “Performability modeling for scheduling and fault tolerance strategies for scientific workflows,” in Proceedings of the 17th international symposium on High performance distributed computing, Boston, MA, USA: ACM, June 2008, pp. 23-34 • G. Wrzesiska, R. V. V. Nieuwpoort, J. Maassen, T. Kielmann, and H. E. Bal, “Fault-tolerant Scheduling of Fine-grained Tasks in Grid Environments,” in International Journal of High Performance Applications, vol. 20, no. 1, February 2006, pp. 103-114. • “Laplace’s Equation” http://mathworld.wolfram.com/LaplacesEquation.html