1 / 25

BioMashups: The New World of Exploratory Bioinformatics?

BioMashups: The New World of Exploratory Bioinformatics?. Jiro Sumitomo , James M. Hogan, Felicity Newell , Paul Roe Microsoft QUT eResearch Centre j.hogan@qut.edu.au. An Agenda. Bioinformatics Tools, Data, and linking them together Exploration vs. Routine Workflow

Download Presentation

BioMashups: The New World of Exploratory Bioinformatics?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioMashups: The New World of Exploratory Bioinformatics? Jiro Sumitomo, James M. Hogan, Felicity Newell, Paul Roe Microsoft QUT eResearch Centre j.hogan@qut.edu.au

  2. An Agenda • Bioinformatics • Tools, Data, and linking them together • Exploration vs. Routine Workflow • Mashups and BioMashups • Some basics and some canonical examples • Biomashups and their limitations • Predictin’ the future

  3. Bioinformatics Abundance of tools and data sources Traditional standalone applications Interactive web sites (More recently) web service hooks Usually purpose-specific tools Link together to solve complex problems

  4. Linking tools together The workflow trade-off: Sophistication vs development effort Keep it simple, and keep the scientistinvolved Make it complex & make the scientist a client Bench scientists usually aren’t software engineers But they can chain operations together if they have the right primitives and the right glue

  5. Extremes of Scientific Workflow The manual data management system Also known as cut-and-paste from Excel Cannot scale, but it presents no barriers… Robust Workflow Systems: Taverna, Kepler et. al. Essential for high-end instrumentation; well-engineered, support for provenance But significant set-up, familiarisation…

  6. The Middle Ground… Scripting in perl, python et al. Significant programming skills needed Useful for well-defined processes, but exploratory work is time consuming Accessing remote data and linking web services beyond most scientists [A niche for biomashups?]

  7. Mashups Mashups are web-based applications for the combination of data sources and services Earliest mashups used Javascript to link exposed service and data APIs, and to wrap existing tools Same issues as perl scripting, with the additional need to organise hosting Little incentive to standardise or share

  8. Mashup Frameworks Development environments, hosting and publication Common interface structure Building a community? Scripting for scientists? Overcoming the programming barrier Depends on the libraries, primitive ops And there is (usually) javascript under the hood

  9. Some of the players…

  10. Mashups & Data Mashups are limited by data exchange Good at passing an index to the data Think latitude & longitude Bad at passing massive data sets around Client mashup architecture e.g Facebook Third Party Services Mashup Server Mashup e.g. Virtual Earth ... Client web browser

  11. BioMashups Middle ground between cut-and-paste and full workflow management systems Corresponds best to perl scripting Ideal when user intervention is needed May be seen as a prototype for Workflow Helps to mask complex data access and search tools which frustrate experts and drive students to exasperation…

  12. SDLM1 Perform a blastx on the sequence. Obtain the best hit/hits by inspection of the blast output page. Retrieve Genbank record of the best hit by clicking on the link in the output page. Determine the known regions by inspection, in this case an ANF_receptor. Perform an Entrez search on this region.

  13. The New UG Biology: SDLM1 Perform a blastx on the sequence. (NCBI Blast block) Obtain the best hit/hits by inspection of the blast output page. (NCBI Blast result parser block) Retrieve Genbank record of the best hit by clicking on the link in the output page. (RDF Block, pointing to Bio2Rdf) Determine the known regions by inspection, in this case an ANF_receptor. (The mashup parses the RDF document instead - Bio2Rdf Block) Perform an Entrez search on this region. (NCBI Entrez block)

  14. Case Study: Analysing Proteins Protein Characteristics Name, sequence Journal articles, cross-reference Protein Prediction Molecular weight, isoelectric point Secondary structure, post-translational mods

  15. Data & Services

  16. Mashups Architecture 13 Custom Blocks 1) Input and Output 2) Processing: protein characteristics 3) Processing: protein prediction Protein Characteristics Input Combine Output Protein Prediction

  17. BioMashups for Proteins Given its Uniprot ID, how much can we find out about a particular protein?

  18. BioMashups for Proteins Given its sequence, what properties can we readily obtain from web-based prediction services?

  19. Predictin’ is difficult… but Frameworks can and will support Ad hoc exploratory bioinformatics Index-based routine computation Building (enclave) communities Varying levels of success in allowing Scientist (& student) driven mashups Sharing and re-use of components

  20. Predictin’ is difficult… but It will be a long time before mashup frameworks: Are used to process data from high-throughput sequencing machines Process large scale collections Beat Taverna & Kepler at provenance

  21. Predictin’ is difficult… BUT

  22. Overcoming the barriers… Building a general BioMashups community Cross-over between frameworks Seeding the community with ‘re-usable’ components and reaching critical mass The myExperiment BioMashups group Bringing BioMashups to the curriculum The new undergraduate biology

  23. Links MQUTeR Bio & BioMashups http://www.mquter.qut.edu.au/bio/ http://www.mquter.qut.edu.au/bio/biomashups.aspx myExperiment BioMashups Group http://www.myexperiment.org/groups/99 Protein Mashups http://www.mquter.qut.edu.au/bio/ProteinMashupsb[1].wmv http://www.popfly.com/users/fsn/Protein%20Biomashups%20Summary%20page

  24. Acknowledgements

  25. Questions?

More Related