1 / 4

Design Principles

Design Principles. Separation between components into a modular system: Independent standalone modules, that are also runnable programs Collaborator wants to run srf2FastQ at home, without a MetaDB Researcher tries custom parameters, but still track his run in the MetaDB

Download Presentation

Design Principles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design Principles • Separation between components into a modular system: • Independent standalone modules, that are also runnable programs • Collaborator wants to run srf2FastQ at home, without a MetaDB • Researcher tries custom parameters, but still track his run in the MetaDB • XML Workflows that defines jobs and data dependencies • Parameterized to reuse workflows on different experiments • Based on DAX standard • Execution engine uses open-source Pegasus project • Wraps standard executables, so no modification to your code • Supports multiple cluster submission, including clusters living on EC2 and other clouds • Uses Globus to support SGE, PBS, Torque, Condor, LSF • Stages data and binaries to the appropriate cluster from whichever cluster has them • Manages temporary space and processing environment • creating temp directories, moving input files in, staging and running your program, copying results out

  2. Java API: public interface WrapperInterface { int init(); // Optional int get_syntax(); int do_test(); int do_verify_input(); int do_verify_parameters(); int do_run(); int do_verify_output(); int clean_up(); // Optional } Application Wrapper Interface • Applications conforms to a standard interface • Developers and users do not have to understand rest of the the pipeline • Force developers to adhere to best practices • Syntax, --help option • Required test harness • Verifications of input, output, parameters Local Execution: $ java SeqWareRunner bpostprocess --help → Reports get_syntax() $ java SeqWareRunner bpostprocess input → Run bpostprocess on the command line $ java SeqWareRunner bpostprocess --db input → Same as above, but without MetaDB feedback $ java SeqWareRunner bpostprocess --db input --config=config.txt $ java SeqWareRunner bpostprocess --db input -A 0 -n 8

  3. XML Workflow • Follows DAX Standard, which is input to Pegasus • Defines jobs, arguments, configuration, and data dependencies • Defines dependencies between jobs • Use Java Freemarker to populate the XML template for each experiment <!-- Dependencies --> <child ref="ID0000002"> <parent ref="ID0000001"/> </child> <child ref="ID0000003"> <parent ref="ID0000001"/> <parent ref="ID0000002"/> </child> </adag> </xml> <?xml version="1.0" encoding="UTF-8"?> <adag xmlns="http://pegasus.isi.edu/schema/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-2.1.xsd" version="2.1" count="1" index="0" name="bfast" jobCount="3" fileCount="0" childCount="2"> <!-- jobs --> <job id="ID0000001" namespace="seqware" name="runner" version="0.0.1"> <argument>bfast matches %{reference_file} %{experiment}.fastq...</argument> <profile namespace="globus" key="max_memory">24576</profile> <profile namespace="globus" key="count">8</profile> <uses file="%{experiment}.fastq" link="input"> <uses file="%{experiment}.bmf" link="output" transfer="false" register="false"> </job> <job id="ID0000002" namespace="seqware" name="runner" version="0.0.1"> <argument>bfast localalign ...</argument> <uses file="%{experiment}.bmf" link="input"> <uses file="%{experiment}.baf" link="output" transfer="false" register="false"> </job> <job id="ID0000003" namespace="seqware" name="runner" version="0.0.1"> <argument>bfast postprocess ...</argument> <uses file="%{experiment}.bmf" link="input"> <uses file="%{experiment}.bam" link="output" transfer="true" register="true"> </job> .....

  4. Pegasus

More Related