1 / 85

Improving Transparency and Reproducibility of Biomedical Research Using Semantic Technologies

Improving Transparency and Reproducibility of Biomedical Research Using Semantic Technologies Mark Wilkinson World Research & Innovation Congress, Brussels, 2013. Isaac Peral Senior Researcher in Biological Informatics Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain.

magee
Download Presentation

Improving Transparency and Reproducibility of Biomedical Research Using Semantic Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Transparency and Reproducibilityof Biomedical ResearchUsing Semantic Technologies Mark Wilkinson World Research & Innovation Congress, Brussels, 2013 Isaac Peral Senior Researcher in Biological InformaticsCentro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain Adjunct Professor of Medical Genetics, University of British ColumbiaVancouver, BC, Canada.

  2. Making the Web abiomedical research platform from hypothesis through to publication

  3. Motivation: 3 intersecting trends in the Life Sciencesthat are now, or soon will be, extremely problematic

  4. TREND #1 Non-reproducible science & the failure of peer review

  5. Trend #1 Multiple recent surveys of high-throughput biologyreveal that upwards of 50% of published studiesare not reproducible - Baggerly, 2009 - Ioannidis, 2009

  6. Trend #1 Similar (if not worse!) in clinical studies - Begley & Ellis, Nature, 2012 - Booth, Forbes, 2012 - Huang & Gottardo, Briefings in Bioinformatics, 2012

  7. Trend #1 “the most common errors are simple,the most simple errors are common” At least partially because the analytical methodology was inappropriateand/or not sufficiently described - Baggerly, 2009

  8. Trend #1 These errors pass peer review The researcher is (sometimes) unaware of the errorThe process that led to the error is not recorded Therefore it cannot be detected during peer-review

  9. Agencies have Noticed! In March, 2012, the US Institute of Medicine ~said“Enough is enough!”

  10. Agencies have Noticed! Institute of Medicine RecommendationsFor Conduct of High-Throughput Research: Rigorously-described, -annotated, and -followed data management and manipulation procedures “Lock down” the computational analysis pipeline once it has been selected Publish the analytical workflow in a formal manner, together with the full starting and result datasets Evolution of Translational Omics Lessons Learned and the Path Forward. The Institute of Medicine of the National Academies, Report Brief, March 2012.

  11. TREND #2 Bigger, cheaper data

  12. Trend #2 High-throughput technologies are becomingcheaper and easier to use

  13. Trend #2 High-throughput technologies are becomingcheaper and easier to use But there are still very few experts trained in statistical analysis of high-throughput data

  14. Trend #2 Therefore Even small, moderately-funded laboratories can now afford to produce more data than they can manage or interpret

  15. TREND #3 “The singularity”

  16. The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009 Slide adapted with permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA June 22, 2012.

  17. “The Singularity” The X-intercept is where, the moment a discovery is made, it is immediately put into practice The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009 Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA June 22, 2012.

  18. You Are Here Scientific research would have to be conducted within a medium that immediately interpreted and disseminated the results...

  19. You Are Here ...in a form that immediately (actively!) affected the results of other researchers...

  20. You Are Here ...without requiring them to be awareof these new discoveries.

  21. 3 intersecting and problematic trends Non-reproducible science that passes peer-review Cheaper production of larger and more complex datasetsthat require specialized expertise to analyze properly Need to more rapidly disseminate and use new discoveries

  22. We Want More!

  23. I don’t just want to reproduceyour experiment...

  24. I want to re-use your experiment

  25. In my own laboratory... On MY DATA!

  26. When I do my analysisI want to draw on the knowledgeof global domain-experts likestatisticians and pathologists... ...as if they were mentors sitting in the chair beside me.

  27. Please don’t make me find all of the data and knowledge that I require to do my experiment ...it simply isn’t possible anymore... Image from: Mark Smiciklas Intersection Consulting, cc-nca

  28. I want to support peer review(ers)so that I do better science. Image from AJ Canncc-by-a license

  29. How do we get there from here?

  30. To overcome these intersecting problems and to achieve the goals of transparentreproducible research

  31. We must learn how to do research IN the Web Not OVER the Web

  32. How we use The Web today

  33. The Web is not a pigeon!

  34. Semantic Web Technologies

  35. Design Pattern for Publishing Analytical Tools on the Semantic Web

  36. Application that uses SADIto interpret globally-distributed expert knowledge in order to discover and executethe right tool, at the right time, for the right analysis

  37. CHALLENGE: Reproduce a peer-reviewed scientific publication by semantically modellingthe problem

  38. The Publication Discovering Protein Partners of aHuman Tumor Suppressor Protein

  39. Original Study Simplified Using what is known about protein interactions in fly & yeast predict new interactions with this Human Tumor Suppressor

  40. Semantic Model of the Experiment OWL Web Ontology Language (OWL) is the language approved by the W3C for representing knowledge in the Web

  41. Semantic Model of the Experiment Note that every word in this diagram is, in reality, a URL (it’s a Semantic Web model) i.e. It refers to the expertise of other researchers, distributedaround the world on the Web(i.e. NanoPubs***) ***remember this word!! It will be important later!!

  42. Set-up the Experimental Conditions In a local data-file provide the protein we are interested inand the two species we wish to use in our comparison taxon:9606 a i:OrganismOfInterest . # human uniprot:Q9UK53 a i:ProteinOfInterest . # ING1 taxon:4932 a i:ModelOrganism1 . # yeast taxon:7227 a i:ModelOrganism2 . # fly

  43. Run the Experiment SELECT ?protein FROM <file:/local/workflow.input.n3> WHERE { ?proteinai:ProbableInteractor .}

  44. Run the Experiment SELECT ?protein FROM <file:/local/workflow.input.n3> WHERE { ?proteinai:ProbableInteractor .} This is the URL that leads our computerto the Semantic model of the problem

  45. SHARE examines the semantic model of Probable Interactors Retrieves third-party expertise from the WebDiscusses with SADI what analytical tools are necessaryChooses the right tools for the problem Solves the problem!

  46. SHARE derives (and executes) the following analysis automatically

  47. SHARE is aware of the context of the specific question being asked

  48. There are four very cool things about what you just saw...

More Related