1 / 36

Bioinformatics Ontology for Automatic Workflow Generation on Web/Grid Services

This paper discusses the role of ontology in automatic workflow generation for bioinformatics on web/grid services, with lessons from our first experience. It explores the use of ontology for representing tacit and explicit knowledge, and the advantages of web services in liberating biological databases and tools. The paper also presents a very simple workflow using web services and proposes the automatic generation of bioinformatics workflows from given input and output data specifications and tasks.

lferguson
Download Presentation

Bioinformatics Ontology for Automatic Workflow Generation on Web/Grid Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Ontology for Automatic Workflow Generation on Web/Grid Services Konagaya Akihiko Project Director Advanced Genome Information Technology Research Group RIKEN Genomic Sciences Center

  2. Contents • Role of Ontology • Web Services for Bioinformatics • Automatics Workflow Generation • Lessons from our First Experience

  3. Role of Ontology

  4. Tacit and Explicit Knowledge We should start from the fact that 'we can know more than we can tell'. Michael Polanyi, “The Tacit Dimension” 1967 Michael Polanyi (1891-1976)    

  5. Rainbow Color How many colors can you see in rainbow?

  6. Ontology for Rainbow Colors RGB Value From 360 nm~400 nm Purple #800080 Indigo #000080 Blue All the colors you can see with your own eyes! #0000FF Green #008000 Yellow #FFFF00 Orange #FF8000 Red to 760 nm~830 nm #FF0000

  7. Which are Purple? #800050 #700080 #600080 #800080 #800060 #800070 #500080

  8. #800050 Blue Element #800060 #800070 Red Element #800080 #700080 #600080 Purple Purple #500080 Representation by Elements and Constructor

  9. Web Services for Bioinformatics

  10. Output Input Task A Task B Task D Task C DB Z DB X DB Y computing computing computing computing Advantages of Web Services • Liberating from the maintenance of biological databases and tools • Scalability of computational resources • High-level application programming interface Web Services Web Services

  11. Very Simple Work Flow UniProt BLAST Search Sequence Hittable UniProt GetEntry Sequences CLUSTAL W Multiple Alignment Tree View Phylogenetic tree

  12. Manual Workflow on Web Apps

  13. Web Service Programming #!/usr/bin/perl use SOAP::Lite; # SOAP API # specify WSDL my $service = SOAP::Lite-> service('http://xml.nig.ac.jp/wsdl/GetEntry.wsdl'); # call web service $result = $service->getXML_DDBJEntry("AB000003"); # print result print $result; http://www.xml.nig.ac.jp/perl.txt

  14. Why don’t we use workflow tools? http://www.cyclonic.org/Taverna_and_myGrid.ppt

  15. Needs Automatic Workflow Generate Tool from Very High Level Specification apply Blastp to UniProt GetEntry from UniProt apply CLUSTALW apply TreeView Workflow for Bioinformatics Web Services Automatics Generation ?

  16. Automatic Generation of Bioinformatics Workflow

  17. Task as Atomic Componentof Workflow Application sample {blastp DAD} Output DataSpecificationsample {ddbjentry,flatfile} Input DataSpecificationsample {aa_sequence,fasta}

  18. Workflow as a Sequence of Tasks Output Input

  19. Input Output Task B Task A Task C Task D Automatic Generation of Workflowfrom Given Input and Output Data Specification and Tasks • Path Finding using Meta Information

  20. Meta Information to Specifythe Functionality of Task Meta Data for Database samples {uniprot} {nt} TASK Meta Information for Command and Options {blastn}{getentry} Meta Data for Input samples {na_sequence,fasta} {aa_sequence,fast} Meta Data for Output sample {ddbjentry,flatfile} {aablastentry,hittable}

  21. V V V id id id V I O Abstract Task N A S S S N N A I O Concrete Task id I : Input Type O: Output Type id Task Hierarchy (is_a) S Homology search BLAST FASTA SSEARCH S : Sequence or Sequence Name V : Various Type N : Nucleoside Sequence A : Amino acid Sequence id : Accession ID E : Database Entry blastn fasta blastx ・・・ rfasta blastp

  22. id S E E id N S Well Known/user defined Task Genome Annotation [glimmer2,blastn,getEntry] glimmer2 blastn getEntry S Task Hierarchy (has_a)

  23. Prototype for ‘Proof of Concept’ • LanguagetuProlog • Java to Prolog • Prolog to Java • Web Service Interface through JAVA API • Task Database • Prolog Clause Database • Optimal Path Finding • Bidirectional Breadth First Search Algorithm

  24. User System Overview Workflow System Server UI Prolog Engine tuProlog Web Service Library (Java) User Specification DDBJ SPBIO Workflow Workflow Library (prolog) Workflow Execution Knowledge Base Result Task Database (1697) Web Service Information (1596) User Data

  25. Screen Snapshot(Workflow Generation Phase)

  26. Screen Snapshot (Workflow Execution Phase)

  27. A.Konagaya ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006 P61982[Mus musculus] P61981[Homo sapiens] Q5RC20[Pongo pygmaeus] P61983[Rattus norvegicus] Q5F3W6[Gallus gallus] P68252[Bos taurus] Q6PCG0[Xenopus laevis] Q6NRY9[Xenopus laevis] Q04917[Homo sapiens] P68509[Bos taurus] P68511[Rattus norvegicus] P68510[Mus musculus] Q6UFZ2[Oncorhynchus mykiss] Q6PC29[Brachydanio rerio] Q6UFZ3[Oncorhynchus mykiss] Obtained Phylogenetic Treeby a generated workflowwhen applying to a Human Insulin Sequence

  28. Lessons from our First Experience

  29. Task Database (prototype) Web Service Call DDBJ Blast DDBJ SRS DDBJ GetEntry DDBJ ClustalW SPBIO Blast 453 638 38 62 405 56 45 Format Transformation Data Selection 1697 In Total

  30. Test Set of Specification

  31. Differences of Generated Workflow Meta Data Input Database Output Full Cmds No input Database No output Full Cmds Input No DB Output Partial Cmds X? input No DB No output Partial Cmds X?

  32. blastp blastp Output Input Input Amino Acid Sequence Amino Acid Sequence HitTable UNIPROT DAD Lack of Interoperability Between the Web Services Output HitTable Why Failed?

  33. Very Similar but not the Same Format Blastp for UniProt Blastp for DAD sp|Q8HXV2|INS_PONPY Insulin precursor [Contains: Insulin B chain... 171 4e-43 L15440-1|AAA59179.1| 107|Homo sapiens insulin protein. 177 1e-43

  34. Conclusion • Web Services have great potential to share Bioinformatics Data ant Tools in all over the world • Needs Automatic Workflow Generation Tools to make full use of Web Services • Bioinformatics Ontology is a key to establish Interoperability among Bioinformatics Web Services

  35. Acknowledgement • Daisuke Shinbara Tokyo Institute of Technology (Hitachi, ltd.) • Sumi Yoshikawa RIKEN GSC, TITECH References Akihiko Konagaya: “Bioinformatics Ontology: Towards the Automatics Generation of Bioinformatics Workflow for Web Services,” in Proc. of Distributed Applications, Web Services, Tools and GRID Infrastructures for Bioinformatics (NETTAB2006), S. Margherita di Pula, Italy (http://www.nettab.org/2006/), pp.75-82 (2006) Akihiko Konagaya: “OBIGrid: Towards the 'Ba' for Sharing Resources, Services and Knowledge for Bioinformatics”, in Proc. of Fourth International Workshop on Biomedical Computations on the Grid (BioGrid), Singapore (CCGRID 2006), 37 (2006)

  36. ご静聴ありがとうございました。 Thank You for Listening

More Related