190 likes | 312 Views
Extending the Galaxy portal with parallel and distributed execution capability. Ketan Maheshwari , Alex Rodriguez, David Kelly, Ravi Madduri , Justin Wozniak, Michael Wilde, Ian Foster Argonne National Laboratory & University of Chicago. Overview. Introduce the Galaxy and Swift systems
E N D
Extending the Galaxy portal with parallel and distributed execution capability Ketan Maheshwari, Alex Rodriguez, David Kelly, Ravi Madduri, Justin Wozniak, Michael Wilde, Ian Foster Argonne National Laboratory & University of Chicago
Overview • Introduce the Galaxy and Swift systems • Couple the Swift and Galaxygateway frameworks • Combine the features offered by Galaxy and Swift into an integrated platform • Different integration schemes based on user requirements, and application patterns • Data management schemes • Example use-case • A demo screencast (if time permits) • Summary and Future work swift-lang.org
Overview of the Galaxy Workflow System* Monitor/ History Panel workspace Tools panel swift-lang.org *slide courtesy: Center for Genomic Regulation, Barcelona, Spain
Simulation of super-cooled glass materials Protein folding using homology-free approaches Climate model analysis and decision making in energy policy Simulation of RNA-protein interaction Multiscale subsurfaceflow modeling Modeling of power gridapplications All have published science results obtained using Swift Overview of Swift Parallel Scripting Framework A B C A B D E T0623, 25 res., 8.2Å to 6.3Å (excluding tail) F > F E Initial D swift-lang.org Protein loop modeling. Courtesy A. Adhikari C Predicted Native
Motivation : Swift and Galaxy are Complementary in many ways • Galaxy (galaxyproject.org) offers a simple, user-friendly web-based interface for composing, execution, monitoring workflows • Galaxy results are sharable, reproducible and reusable • Galaxy is a widely used: well-supported by user community e.g. Next Generation Sequencing (NGS) Community • Swift provides sophisticated interface to parallel and distributed platforms • Swift scripts are structured expressions of complex application flows which are readily executable on multiple, diverse and independent remote resources swift-lang.org
Swift-Galaxy Integration Overview Clouds Clusters Supercomputers Grids • Approaches enabling integration in different ways: • At tool level • At Workflow level • At language/expression level Galaxy web-console Galaxy server Galaxy-tool user computer Swift app libraries swift-lang.org
Computational Infrastructure • Galaxy offers a limited support for Distributed and Parallel Resources • Needs additional adhoc configuration to interface • Constrained in some ways, e.g. needs shared file system* • Swift is robustly interfaced to a wider types of Resource Managers with finer control over job submission parameters: • Supports: PBS/Torque, SGE, SLURM, Condor • Supports bag-of-workstations: clouds, workstation clusters • Supports distributed file system, multiple execution sites simultaneously swift-lang.org * To the best of our knowledge
Interface with heterogeneous parallel systems is a challenge ## SLURM #!/bin/bash #SBATCH -J ... #SBATCH -oe ... #SBATCH –p ... #SBATCH –N ... ibrun./my_execargs # CONDOR Executable=e Universe=std Error=err.$ Input=in.$ Output=out.$ Log=foo.log Queue ##TORQUE/PBS #!/bin/bash #PBS -q ccs_short #PBS -N my_serial_job #PBS -l walltime=01:00:00 #PBS -l nodes=1:noib:ppn=1 #PBS -m e ./a.out SGE #!/bin/bash #$ -cwd #$ -j y #$ -S /bin/bash pwd ./my_execargs swift-lang.org
Scheme1: Wrap Swift around Galaxy Tools swift-tool A execution history swift-tool B execution history . . swift-tool N . . . . . . Other Galaxy tools execution history swift-lang.org
Scheme 2: Interoperability between expressions • Internally both Swift and Galaxy codes are represented in XML dialects • Automated transformation to convert from one form into another • Currently under development XML transformation Swift script Galaxy Workflow swift-lang.org
Scheme 3: Harness Data Parallelism using foreach foreach protein, idx in proteinList{ runBlast (protein); tracef(“The index is: %i\n", idx); } foreachidx in [begin:end:step]{ runmyapp (idx); } swift-foreach wrapper Galaxy-tool Galaxy-tool out-data in-data . . (merge) (split) Galaxy-tool swift-lang.org
Cloud Interfaces • Galaxy instances running on cloud nodes are already taking advantage of cloud-based resources • Swift’s coasters mechanism can farm resources and combine multiple cloud and non-cloud resources in a single application run. swift-lang.org
Data Management • Both Galaxy and Swift offer various data management capabilities • Galaxy offers remote data uploading and viewing capabilities • Swift allows disc resident data to be operated upon as program variables • Swift’s data-providers are interfaced with various data management protocols and can manage data motions at runtime swift-lang.org
Evaluation Application: Inference analysis for power prices generate sample generate sample samples … Candidate Solution Candidate Solution … batches batches generate sample generate sample generate sample generate sample … … batch size lower bound upper bound lower bound … … upper bound … Variance & Mean swift-lang.org
Swift Script for Inference Analysis import "mappings"; import "apps”; type file; intnS[] = [10, 100, 1000, 10000, 100000]; foreach S, idxs in nS { sample0 = gensample(S, wind_data); obj[idxs] = ampl(sample0); foreach B, idxb in [10:40:10] { foreachk in [0:B]{ sample1 = gensample(S, wind_data); obj_l[idxs][idxb][k] = ampl_L(sample1); sample2 = gensample(S, wind_data); obj_u[idxs][idxb][k] = ampl_U(sample2, obj[idxs]); }}} swift-lang.org
Summary • Swift-Galaxy integration improves science gateways: • User control • Structured distributed computing • Simple • Interactive • Commonalities in basic execution model of Galaxy and Swift leads to many avenues of integration schemes • Broadly, Swift acts as a backend manager while Galaxy being the frontend for operations • Example of combining command-line and GUI based frameworks swift-lang.org
Future Work • A generic approach for each of the integration schemes • Wider application adaptation • Finer and broader exposure to configuration options to users • Interactive monitoring features • Authentication features, Globus based identity management swift-lang.org
Acknowledgements • This work was supported in part by the NIH through the NHLBI grant: The Cardiovascular Research Grid (R24HL085343) and by the U.S. Department of Energy under contract DE-AC02- 06CH11357. • We are grateful to Amazon, Inc., for an award of Amazon Web Services time that facilitated early experiments. • Colleagues at Swift and Globus groups at the MCS Division, Argonne National Laboratory swift-lang.org
Thank you!Visit swift-lang.org for more information about Swift parallel scripting framework swift-lang.org