1 / 19

Workflow Based in silico Experiments The my Grid Platform a tame techy’s view

Workflow Based in silico Experiments The my Grid Platform a tame techy’s view. Carole Goble The University of Manchester, UK http://www.mygrid.org.uk Acknowledgements to all the members of the my Grid consortium. UK eScience Program Pilot.

Download Presentation

Workflow Based in silico Experiments The my Grid Platform a tame techy’s view

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workflow Based in silico ExperimentsThe myGrid Platforma tame techy’s view Carole Goble The University of Manchester, UK http://www.mygrid.org.uk Acknowledgements to all the members of the myGrid consortium NIBHI launch 6 June 2005

  2. UK eScience Program Pilot £3.2 million Oct 2001-June 2005 + follow on projects Plus other contributors to Taverna http://taverna.sf.net NIBHI launch 6 June 2005

  3. Global Bioinformatics • Global science = global publication + global sharing + global interoperation • 100s of applications globally distributed and heterogeneous often without APIs • Bioinformatics experiments chain applications and resources together NIBHI launch 6 June 2005

  4. ~1.5 Mb 7q11.23 Patient deletions * * WBS SVAS CTA-315H11 CTB-51J22 Physical Map ‘Gap’ Chr 7 ~155 Mb Williams-Beuren Syndrome • Contiguous sporadic gene deletion disorder • 1/20,000 live births, caused by unequal crossover (homologous recombination) during meiosis • Haploinsufficiency of the region results in the phenotype NIBHI launch 6 June 2005

  5. 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa • Identify new, overlapping sequence of interest • Characterise the new sequence at nucleotide and amino acid level Cutting and pasting between numerous web-based services i.e. BLAST, InterProScan etc NIBHI launch 6 June 2005

  6. Bioinformatics pipelines on the web • Copying and pasting from one web based application to annotation by hand • Advantages : quick, easy access to distributed resources • Disadvantages: time consuming, error prone, tacit procedure so difficult to share both protocol and results RepeatMasker BLASTn Twinscan NIBHI launch 6 June 2005

  7. Workflows Predicted genes out Sequence in RepeatMasker Web service BLASTn Web Service TwinscanWeb Service • Simple scripting language specifies how steps of a pipeline link together • High level picture of the pipeline separated from low level fiddling • Application logic and low level fiddling encapsulated in remote web services • Advantages : automation, quick to write, easier to explain, share, relocate, and record provenance of results in a standard way NIBHI launch 6 June 2005

  8. SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST Workflows http://taverna.sourceforge.net/ Freefluo Freefluo Workflow engine to run workflows Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available SeqHound Service Special processor NIBHI launch 6 June 2005

  9. Soaplab Service WSDL Web Service BioMOBY Service Local Java Service NIBHI launch 6 June 2005

  10. Scufl Workflows + Taverna Workflow Workbench LSID OGSA-Distributed Query Processing mIR Results management Portal & Application tools Metadata & provenance management using semantics e-Science process patterns e-Science mediator KAVE e-Science coordination e-Science events Text Mining Services Publication and Discovery using semantics myGrid information model Feta Service management Components designed to work together Ontology Notification service Termino Pedro NIBHI launch 6 June 2005

  11. Remarks • Tools for the individual scientist • Cuts down the time taken to perform one WBS pipeline from 2 weeks to 2 hours • Systematic records of result provenance and experimental methods • Faster, automated, more systematic and doesn’t get bored • Reuse of shared workflows & services • Data intensive -> compute intensive NIBHI launch 6 June 2005

  12. Remarks • Deal with many legacy remote services • Incorporating my service • Jam today – immediate benefits for bioinformaticians and service providers but added value with added effort • Easy to get started (1-2 hours tutorial) • Change of work practice • careful integration & coordination and presentation NIBHI launch 6 June 2005

  13. Graves Disease Autoimmune disease of the thyroid in which the immune system of an individual attacks cells in the thyroid gland resulting in hyperthyroidism Gene annotation pipelines Affymetrix microarray analysis pipelines Find differentially expressed genes, e.g. NF-kappa beta inhibitor protein NIBHI launch 6 June 2005

  14. GO services select p.Name, p.Seq from p in db_proteinSequences where p.OS='HomoSapiens'; Pepmapper: Web service for protein identification Auxillary services DQP web service Pepmapper: Web service for protein identification xml format → fasta format Value-added protein identification Genome-focused protein identification NIBHI launch 6 June 2005

  15. 6 3 5 4 Pancreatic cancer Examination of genes over expressed in pancreatic cancer by Mark Fortner, Lexicon Genetics 1 2 NIBHI launch 6 June 2005

  16. Integrating across scalesOrchestrating models MIAS-Grid PsyGrid Computational steerage of heart simulation codes NIBHI launch 6 June 2005

  17. Open… Source Domain services and resources Community Application Data types Architecture based in web services …for Systems Biology NIBHI launch 6 June 2005

  18. We observe that • Workflows support informational science and aid in the generation of the knowledge ecology… • …by gathering and coordinating remote and local services and applications… • …and recording and sharing know-how…but • …there are serious technical challenges • gathering and coordinating results and metadata • security and authorisation • user interaction • different kinds of data and processes • easy deployment … • Users, service providers and IT providers in partnership • activation energy, legacy and relevance… NIBHI launch 6 June 2005

  19. Partnerships Biologists, Bioinformaticians, Clinicians Service Providers Middleware developers NIBHI launch 6 June 2005

More Related