110 likes | 252 Views
Bioinformatics Workflows. Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys). Bioinformatics pipelines on the web. Copying and pasting from one web based application to annotation by hand
E N D
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
Bioinformatics pipelines on the web • Copying and pasting from one web based application to annotation by hand • Advantages : quick, easy access to distributed resources • Disadvantages: time consuming, error prone, tacit procedure so difficult to share both protocol and results RepeatMasker BLASTn Twinscan
Automating pipelines • Using Perl/ Matlab scripts to implement a pipeline • Advantages : automation, quick to write, significant community resources (e.g. BioPerl) • Disadvantages: hard to explain, hard to relocate, hard to tinker with.
Workflows Predicted genes out Sequence in RepeatMasker Web service BLASTn Web Service TwinscanWeb Service • Simple scripting language aims to specify how steps of a pipeline link together • High level picture of the pipeline separated from any low level fiddling • Application logic and low level fiddling encapsulated in remote web services • Advantages : automation, quick to write, easier to explain, share, relocate, and record provenance of results in a standard way
Workflow components in myGrid • Scufl – Simple Conceptual Unified Flow Language • Developed by myGrid members at EBI. • Designed to be as simple as possible, just enough features to support bioinformatics workflows • Taverna – a tool for writing, running workflows and examining results. (http://taverna.sourceforge.net) • FreeFluo – workflow engine to run workflows (http://freefluo.sourceforge.net)
Workflow use • Newcastle University (Anil Wipat, Peter Li) • Affymetrix Microarray Analysis Workflow • Gene annotation workflow • Manchester University May Tassabehji, PhD student Hannah Tipney, Medical Gentics, St Marys (Wellcome Trust Funded) • Gene alerting service workflow (GAS) • Gene and protein annotation workflow • And others
Workflow experience + • Easy to get started with Taverna (1-2 hours tutorial) • Sharing does happen • Cuts down the time taken to perform one pipeline from 2wks to 2 hours
Workflow experience: outstanding issues • Early days: web services rare; significant time take to wrap applications as web services (licensing, installation, maintenance) • Soaplab and Gowlab try to help (http://industry.ebi.ac.uk/soaplab) • Fiddly bits don’t go away: Many ‘shim’ services needed to ensure the output of one step fits the expected input of another • Automation produces many results in a short amount of time. Issues of result management and display
Other workflow systems • Commercial bioinformatics – drug discovery • Incogen VIBE • TurboWorx Pipeline Pilot • eScience • DiscoveryNet (bioinformatics – proprietary) • Keppler ( US ecology) • Triana (UK Physics astronomy, signal processing)
Workflow standards • Can’t have enough of them! All currently come from e-Business rather than science community • BPEL – Business Process Execution Language • WS – Orchestration • XML Process Definition Language (XPDL) • Business Process Markup Language (BPML)