190 likes | 303 Views
Workflow Based in silico Experiments The my Grid Platform a tame techy’s view. Carole Goble The University of Manchester, UK http://www.mygrid.org.uk Acknowledgements to all the members of the my Grid consortium. UK eScience Program Pilot.
E N D
Workflow Based in silico ExperimentsThe myGrid Platforma tame techy’s view Carole Goble The University of Manchester, UK http://www.mygrid.org.uk Acknowledgements to all the members of the myGrid consortium NIBHI launch 6 June 2005
UK eScience Program Pilot £3.2 million Oct 2001-June 2005 + follow on projects Plus other contributors to Taverna http://taverna.sf.net NIBHI launch 6 June 2005
Global Bioinformatics • Global science = global publication + global sharing + global interoperation • 100s of applications globally distributed and heterogeneous often without APIs • Bioinformatics experiments chain applications and resources together NIBHI launch 6 June 2005
~1.5 Mb 7q11.23 Patient deletions * * WBS SVAS CTA-315H11 CTB-51J22 Physical Map ‘Gap’ Chr 7 ~155 Mb Williams-Beuren Syndrome • Contiguous sporadic gene deletion disorder • 1/20,000 live births, caused by unequal crossover (homologous recombination) during meiosis • Haploinsufficiency of the region results in the phenotype NIBHI launch 6 June 2005
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa • Identify new, overlapping sequence of interest • Characterise the new sequence at nucleotide and amino acid level Cutting and pasting between numerous web-based services i.e. BLAST, InterProScan etc NIBHI launch 6 June 2005
Bioinformatics pipelines on the web • Copying and pasting from one web based application to annotation by hand • Advantages : quick, easy access to distributed resources • Disadvantages: time consuming, error prone, tacit procedure so difficult to share both protocol and results RepeatMasker BLASTn Twinscan NIBHI launch 6 June 2005
Workflows Predicted genes out Sequence in RepeatMasker Web service BLASTn Web Service TwinscanWeb Service • Simple scripting language specifies how steps of a pipeline link together • High level picture of the pipeline separated from low level fiddling • Application logic and low level fiddling encapsulated in remote web services • Advantages : automation, quick to write, easier to explain, share, relocate, and record provenance of results in a standard way NIBHI launch 6 June 2005
SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST Workflows http://taverna.sourceforge.net/ Freefluo Freefluo Workflow engine to run workflows Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available SeqHound Service Special processor NIBHI launch 6 June 2005
Soaplab Service WSDL Web Service BioMOBY Service Local Java Service NIBHI launch 6 June 2005
Scufl Workflows + Taverna Workflow Workbench LSID OGSA-Distributed Query Processing mIR Results management Portal & Application tools Metadata & provenance management using semantics e-Science process patterns e-Science mediator KAVE e-Science coordination e-Science events Text Mining Services Publication and Discovery using semantics myGrid information model Feta Service management Components designed to work together Ontology Notification service Termino Pedro NIBHI launch 6 June 2005
Remarks • Tools for the individual scientist • Cuts down the time taken to perform one WBS pipeline from 2 weeks to 2 hours • Systematic records of result provenance and experimental methods • Faster, automated, more systematic and doesn’t get bored • Reuse of shared workflows & services • Data intensive -> compute intensive NIBHI launch 6 June 2005
Remarks • Deal with many legacy remote services • Incorporating my service • Jam today – immediate benefits for bioinformaticians and service providers but added value with added effort • Easy to get started (1-2 hours tutorial) • Change of work practice • careful integration & coordination and presentation NIBHI launch 6 June 2005
Graves Disease Autoimmune disease of the thyroid in which the immune system of an individual attacks cells in the thyroid gland resulting in hyperthyroidism Gene annotation pipelines Affymetrix microarray analysis pipelines Find differentially expressed genes, e.g. NF-kappa beta inhibitor protein NIBHI launch 6 June 2005
GO services select p.Name, p.Seq from p in db_proteinSequences where p.OS='HomoSapiens'; Pepmapper: Web service for protein identification Auxillary services DQP web service Pepmapper: Web service for protein identification xml format → fasta format Value-added protein identification Genome-focused protein identification NIBHI launch 6 June 2005
6 3 5 4 Pancreatic cancer Examination of genes over expressed in pancreatic cancer by Mark Fortner, Lexicon Genetics 1 2 NIBHI launch 6 June 2005
Integrating across scalesOrchestrating models MIAS-Grid PsyGrid Computational steerage of heart simulation codes NIBHI launch 6 June 2005
Open… Source Domain services and resources Community Application Data types Architecture based in web services …for Systems Biology NIBHI launch 6 June 2005
We observe that • Workflows support informational science and aid in the generation of the knowledge ecology… • …by gathering and coordinating remote and local services and applications… • …and recording and sharing know-how…but • …there are serious technical challenges • gathering and coordinating results and metadata • security and authorisation • user interaction • different kinds of data and processes • easy deployment … • Users, service providers and IT providers in partnership • activation energy, legacy and relevance… NIBHI launch 6 June 2005
Partnerships Biologists, Bioinformaticians, Clinicians Service Providers Middleware developers NIBHI launch 6 June 2005