1 / 19

Extending the Galaxy portal with parallel and distributed execution capability

Extending the Galaxy portal with parallel and distributed execution capability. Ketan Maheshwari , Alex Rodriguez, David Kelly, Ravi Madduri , Justin Wozniak, Michael Wilde, Ian Foster Argonne National Laboratory & University of Chicago. Overview. Introduce the Galaxy and Swift systems

menefer
Download Presentation

Extending the Galaxy portal with parallel and distributed execution capability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extending the Galaxy portal with parallel and distributed execution capability Ketan Maheshwari, Alex Rodriguez, David Kelly, Ravi Madduri, Justin Wozniak, Michael Wilde, Ian Foster Argonne National Laboratory & University of Chicago

  2. Overview • Introduce the Galaxy and Swift systems • Couple the Swift and Galaxygateway frameworks • Combine the features offered by Galaxy and Swift into an integrated platform • Different integration schemes based on user requirements, and application patterns • Data management schemes • Example use-case • A demo screencast (if time permits) • Summary and Future work swift-lang.org

  3. Overview of the Galaxy Workflow System* Monitor/ History Panel workspace Tools panel swift-lang.org *slide courtesy: Center for Genomic Regulation, Barcelona, Spain

  4. Simulation of super-cooled glass materials Protein folding using homology-free approaches Climate model analysis and decision making in energy policy Simulation of RNA-protein interaction Multiscale subsurfaceflow modeling Modeling of power gridapplications All have published science results obtained using Swift Overview of Swift Parallel Scripting Framework A B C A B D E T0623, 25 res., 8.2Å to 6.3Å (excluding tail) F > F E Initial D swift-lang.org Protein loop modeling. Courtesy A. Adhikari C Predicted Native

  5. Motivation : Swift and Galaxy are Complementary in many ways • Galaxy (galaxyproject.org) offers a simple, user-friendly web-based interface for composing, execution, monitoring workflows • Galaxy results are sharable, reproducible and reusable • Galaxy is a widely used: well-supported by user community e.g. Next Generation Sequencing (NGS) Community • Swift provides sophisticated interface to parallel and distributed platforms • Swift scripts are structured expressions of complex application flows which are readily executable on multiple, diverse and independent remote resources swift-lang.org

  6. Swift-Galaxy Integration Overview Clouds Clusters Supercomputers Grids • Approaches enabling integration in different ways: • At tool level • At Workflow level • At language/expression level Galaxy web-console Galaxy server Galaxy-tool user computer Swift app libraries swift-lang.org

  7. Computational Infrastructure • Galaxy offers a limited support for Distributed and Parallel Resources • Needs additional adhoc configuration to interface • Constrained in some ways, e.g. needs shared file system* • Swift is robustly interfaced to a wider types of Resource Managers with finer control over job submission parameters: • Supports: PBS/Torque, SGE, SLURM, Condor • Supports bag-of-workstations: clouds, workstation clusters • Supports distributed file system, multiple execution sites simultaneously swift-lang.org * To the best of our knowledge

  8. Interface with heterogeneous parallel systems is a challenge ## SLURM #!/bin/bash #SBATCH -J ... #SBATCH -oe ... #SBATCH –p ... #SBATCH –N ... ibrun./my_execargs # CONDOR Executable=e Universe=std Error=err.$ Input=in.$ Output=out.$ Log=foo.log Queue ##TORQUE/PBS #!/bin/bash #PBS -q ccs_short #PBS -N my_serial_job #PBS -l walltime=01:00:00 #PBS -l nodes=1:noib:ppn=1 #PBS -m e ./a.out SGE #!/bin/bash #$ -cwd #$ -j y #$ -S /bin/bash pwd ./my_execargs swift-lang.org

  9. Scheme1: Wrap Swift around Galaxy Tools swift-tool A execution history swift-tool B execution history . . swift-tool N . . . . . . Other Galaxy tools execution history swift-lang.org

  10. Scheme 2: Interoperability between expressions • Internally both Swift and Galaxy codes are represented in XML dialects • Automated transformation to convert from one form into another • Currently under development XML transformation Swift script Galaxy Workflow swift-lang.org

  11. Scheme 3: Harness Data Parallelism using foreach foreach protein, idx in proteinList{ runBlast (protein); tracef(“The index is: %i\n", idx); } foreachidx in [begin:end:step]{ runmyapp (idx); } swift-foreach wrapper Galaxy-tool Galaxy-tool out-data in-data . . (merge) (split) Galaxy-tool swift-lang.org

  12. Cloud Interfaces • Galaxy instances running on cloud nodes are already taking advantage of cloud-based resources • Swift’s coasters mechanism can farm resources and combine multiple cloud and non-cloud resources in a single application run. swift-lang.org

  13. Data Management • Both Galaxy and Swift offer various data management capabilities • Galaxy offers remote data uploading and viewing capabilities • Swift allows disc resident data to be operated upon as program variables • Swift’s data-providers are interfaced with various data management protocols and can manage data motions at runtime swift-lang.org

  14. Evaluation Application: Inference analysis for power prices generate sample generate sample samples … Candidate Solution Candidate Solution … batches batches generate sample generate sample generate sample generate sample … … batch size lower bound upper bound lower bound … … upper bound … Variance & Mean swift-lang.org

  15. Swift Script for Inference Analysis import "mappings"; import "apps”; type file; intnS[] = [10, 100, 1000, 10000, 100000]; foreach S, idxs in nS { sample0 = gensample(S, wind_data); obj[idxs] = ampl(sample0); foreach B, idxb in [10:40:10] { foreachk in [0:B]{ sample1 = gensample(S, wind_data); obj_l[idxs][idxb][k] = ampl_L(sample1); sample2 = gensample(S, wind_data); obj_u[idxs][idxb][k] = ampl_U(sample2, obj[idxs]); }}} swift-lang.org

  16. Summary • Swift-Galaxy integration improves science gateways: • User control • Structured distributed computing • Simple • Interactive • Commonalities in basic execution model of Galaxy and Swift leads to many avenues of integration schemes • Broadly, Swift acts as a backend manager while Galaxy being the frontend for operations • Example of combining command-line and GUI based frameworks swift-lang.org

  17. Future Work • A generic approach for each of the integration schemes • Wider application adaptation • Finer and broader exposure to configuration options to users • Interactive monitoring features • Authentication features, Globus based identity management swift-lang.org

  18. Acknowledgements • This work was supported in part by the NIH through the NHLBI grant: The Cardiovascular Research Grid (R24HL085343) and by the U.S. Department of Energy under contract DE-AC02- 06CH11357. • We are grateful to Amazon, Inc., for an award of Amazon Web Services time that facilitated early experiments. • Colleagues at Swift and Globus groups at the MCS Division, Argonne National Laboratory swift-lang.org

  19. Thank you!Visit swift-lang.org for more information about Swift parallel scripting framework swift-lang.org

More Related