1 / 44

Software for the Data-Driven Researcher of the Future

Software for the Data-Driven Researcher of the Future. Dr. Paul Fisher Paul.Fisher@manchester.ac.uk http://www.cs.man.ac.uk/~fisherp. What is myGrid?. An e-Science Collaboration Since 2001 Numerous partners involved: Manchester Southampton Oxford EMBL-EBI

reya
Download Presentation

Software for the Data-Driven Researcher of the Future

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software for the Data-Driven Researcher of the Future Dr. Paul Fisher Paul.Fisher@manchester.ac.uk http://www.cs.man.ac.uk/~fisherp

  2. What is myGrid? • An e-Science Collaboration Since 2001 • Numerous partners involved: • Manchester • Southampton • Oxford • EMBL-EBI • It provides sustainable and production quality software • Supported by OMII-UK, EPSRC and BBSRC • Mixture of developers, bioinformaticians and researchers Software | Services | Content | Skills | Community

  3. Taverna myGrid Open Suite of Tools Client User Interfaces Workflow Repository Workflow GUI Workbench and 3rd party plug-ins Web Portals Service Catalogue Provenance Store Workflow Server Programming and APIs Activity and Service Plug-in Manager Open Provenance Model Secure Service Access, and Programming APIs

  4. Huge amounts of data Microarray 1000+ Genes Next Gen Sequencing QTL regions 100+ Genes 10,000+ Genes How do I look at ALL the genes systematically?

  5. Issues with current approaches • Scale of analysis task overwhelms researchers – lots of data • User bias and premature filtering of datasets – cherry picking • Hypothesis-Driven approach to data analysis • Constant changes in data - problems with re-analysis of data • Implicit methodologies (hyper-linking through web pages) • Error proliferation from any of the listed issues – notably human error • Solution Automate

  6. Web Services • Technology and standard for exposing code and data resources by an means that can be consumed by a third party remotely • Describes how to interact with it, e.g. service parameters • Workflows • General technique for describing and executing a process • Describes what you want to do, including the services to use

  7. What kind of Services? WSDL Web Services REST BioMart R-processor BioMoby SoapLab Grid Services Local Java services Beanshell Workflows

  8. Who Provides the Services? • Open domain services and resources • Taverna accesses 3500+ services (11,874 operations) • Third party – we don’t own them – we didn’t build them • All the major providers • NCBI, DDBJ, EBI … • Enforce NO common data model. Can include your own services and resources too !!!

  9. Where can I find these services?

  10. A public centralised and curated registry of Life Science Web Services ‘Web 2.0’-style website and API Allow anyone to register, discover and curate Web Services Community oriented with expert guidance Open content, open source, open platform www.BioCatalogue.org

  11. Workflow diagram Available services Workflow Explorer http://www.taverna.org.uk

  12. What are Workflows used for?

  13. Taverna Taverna first released 2004 Current version Taverna 2.2 Currently 1500+ users per month, 350+ organizations, ~40 countries, 80,000+ downloads across versions Freely available, open source LGPL Windows, Mac OS, and Linux http://www.taverna.org.uk User and developer workshops Documentation Public Mailing list anddirect emailsupport

  14. Trypanosomiasis in Africa Steve Kemp Andy Brass + many Others http://www.genomics.liv.ac.uk/tryps/trypsindex.html

  15. Reuse, Recycle, Repurpose Workflows Identify biological pathways implicated in resistance to Trypanosomiasis in cattle using mouse as a model organism. Dr Paul Fisher Dr Jo Pennock Identify the biological pathways colitis and helminth infections in the mouse model DOI: 10.1002/ibd.21326 | PMID: 20687192

  16. Where can I find workflows?

  17. Recycling, Reuse, Repurposing • Share • Search • Re-use • Re-purpose • Execute • Communicate • Record http://www.myexperiment.org/

  18. Taverna Plug-in Bringing myExperiment to the Taverna user

  19. Take a breath….. • myGrid • Taverna • Workflows good for automation • Reduce errors • BioCatalogue • Publicly curated repository of Web Services • myExperiment • Web 2.0 repository supporting Workflow discovery and re-use

  20. Taverna and the ‘Cloud’ + Analysing Next Generation Sequencing Data

  21. Analysing African Cattle with Taverna 2.2 • Different breeds of African Cattle • 10,000 years separation • African Livestock adaptations: • More productive • Increases disease resistance • Potential outcomes: • Food security • Understanding resistance • Understanding environmental • Understanding diversity • http://www.bbc.co.uk/news/10403254

  22. The study • Lots of sites involved in Study: • Univeristy of Liverpool • University of Manchester • ILRI (Nairobi)…… • Genetic variation in cattle species • African breeds: N’dama, Boran and Sahiwal • Resistance to African trypanosomiasis infection (sleeping sickness) • Genetic differences to make one species more resistant? • Potential consequences of those genetic differences? • Pathways are affected by those changes?

  23. The Analysis Problem • Sequenced DNA from 3 cattle breeds using SOLiD / Illumina • 22 million SNPs for Sahiwal alone • N’Dama, Boran ~ 11 millions SNPs each • Large data • Comparing new data with reference genomes • Identifying interesting differences • e.g. non-synonymous SNPs, stop lost, stop gained, splicing regions etc

  24. MAP FILTER ANALYSIS The Analysis Pipeline (in Perl) Input SNP data from sequencer Map between Genome Builds (Liftover) Filter for SNPs in Exons SNP consequences Identifying damaging SNPs (Polyphen) Harry Noyes – University of Liverpool

  25. Workflow and phases MSc Student - Mohammad Khodadadi Input SNP file Populate DB with start SNP’s and resource version numbers Lift-over: maps between UMD3 and BTA4 cow assemblies Exon positions from ENSMBL Find SNPs in Exon regions PolyPhen to mark “dangerous” SNP’s The result can be either a MySQL database or TSV / CSV download

  26. Taverna and the ‘Cloud’ +

  27. What we will demonstrate • Uploading Next Generation Sequencing SNP data to the cloud • Creating a new experiment • Running a workflow on multiple cloud instances • Showing result output, including links to annotated SNPs

  28. Demo

  29. Managing and Processing Data

  30. Accessing Taverna on the Cloud

  31. Loading inputs Experiment Metadata Input Provenance Jobs Status Input data summary

  32. Summary of Workflow Output 11 Million SNP for N’ Dama Non-synonymous coding SNPs Polyphen predictions: probably damaging N.B. Number variances due to workflow and polyphen filtering process

  33. New Developments in myGrid

  34. Taverna • Taverna 2.2 execution engine • Large data processing • Pause, resume and cancelling workflows • Retry and parallelisation layer • Taverna 2.2 server • Remote workflow execution • Workflows launched from web pages • Workflows executed on the cloud Essential for cloud

  35. Other New features Validation reporting • Loading and sharing service sets • Support for offline editing • New provenance features

  36. BioCatalogue Plug-in ISMB 10

  37. Training Tutorials and Training 58+ tutorials to >900 people. >20 universities, Life Science Institutes, and networks. Major Bio conferences Summer schools in Biology and Middleware Developer and User Days Annotation Jamborees Undergraduate and Postgraduate Bioinformatics in > 30 universities.

  38. More Information myGrid http://www.mygrid.org.uk Taverna http://www.taverna.org.uk myExperiment http://www.myexperiment.org BioCatalogue http://www.biocatalogue.org

  39. Visit us at the myGrid Silver Sponsor Stand

  40. FIN

More Related