860 likes | 970 Views
Improving Transparency and Reproducibility of Biomedical Research Using Semantic Technologies Mark Wilkinson World Research & Innovation Congress, Brussels, 2013. Isaac Peral Senior Researcher in Biological Informatics Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain.
E N D
Improving Transparency and Reproducibilityof Biomedical ResearchUsing Semantic Technologies Mark Wilkinson World Research & Innovation Congress, Brussels, 2013 Isaac Peral Senior Researcher in Biological InformaticsCentro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain Adjunct Professor of Medical Genetics, University of British ColumbiaVancouver, BC, Canada.
Making the Web abiomedical research platform from hypothesis through to publication
Motivation: 3 intersecting trends in the Life Sciencesthat are now, or soon will be, extremely problematic
TREND #1 Non-reproducible science & the failure of peer review
Trend #1 Multiple recent surveys of high-throughput biologyreveal that upwards of 50% of published studiesare not reproducible - Baggerly, 2009 - Ioannidis, 2009
Trend #1 Similar (if not worse!) in clinical studies - Begley & Ellis, Nature, 2012 - Booth, Forbes, 2012 - Huang & Gottardo, Briefings in Bioinformatics, 2012
Trend #1 “the most common errors are simple,the most simple errors are common” At least partially because the analytical methodology was inappropriateand/or not sufficiently described - Baggerly, 2009
Trend #1 These errors pass peer review The researcher is (sometimes) unaware of the errorThe process that led to the error is not recorded Therefore it cannot be detected during peer-review
Agencies have Noticed! In March, 2012, the US Institute of Medicine ~said“Enough is enough!”
Agencies have Noticed! Institute of Medicine RecommendationsFor Conduct of High-Throughput Research: Rigorously-described, -annotated, and -followed data management and manipulation procedures “Lock down” the computational analysis pipeline once it has been selected Publish the analytical workflow in a formal manner, together with the full starting and result datasets Evolution of Translational Omics Lessons Learned and the Path Forward. The Institute of Medicine of the National Academies, Report Brief, March 2012.
TREND #2 Bigger, cheaper data
Trend #2 High-throughput technologies are becomingcheaper and easier to use
Trend #2 High-throughput technologies are becomingcheaper and easier to use But there are still very few experts trained in statistical analysis of high-throughput data
Trend #2 Therefore Even small, moderately-funded laboratories can now afford to produce more data than they can manage or interpret
TREND #3 “The singularity”
The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009 Slide adapted with permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA June 22, 2012.
“The Singularity” The X-intercept is where, the moment a discovery is made, it is immediately put into practice The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009 Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA June 22, 2012.
You Are Here Scientific research would have to be conducted within a medium that immediately interpreted and disseminated the results...
You Are Here ...in a form that immediately (actively!) affected the results of other researchers...
You Are Here ...without requiring them to be awareof these new discoveries.
3 intersecting and problematic trends Non-reproducible science that passes peer-review Cheaper production of larger and more complex datasetsthat require specialized expertise to analyze properly Need to more rapidly disseminate and use new discoveries
When I do my analysisI want to draw on the knowledgeof global domain-experts likestatisticians and pathologists... ...as if they were mentors sitting in the chair beside me.
Please don’t make me find all of the data and knowledge that I require to do my experiment ...it simply isn’t possible anymore... Image from: Mark Smiciklas Intersection Consulting, cc-nca
I want to support peer review(ers)so that I do better science. Image from AJ Canncc-by-a license
To overcome these intersecting problems and to achieve the goals of transparentreproducible research
We must learn how to do research IN the Web Not OVER the Web
How we use The Web today
Design Pattern for Publishing Analytical Tools on the Semantic Web
Application that uses SADIto interpret globally-distributed expert knowledge in order to discover and executethe right tool, at the right time, for the right analysis
CHALLENGE: Reproduce a peer-reviewed scientific publication by semantically modellingthe problem
The Publication Discovering Protein Partners of aHuman Tumor Suppressor Protein
Original Study Simplified Using what is known about protein interactions in fly & yeast predict new interactions with this Human Tumor Suppressor
Semantic Model of the Experiment OWL Web Ontology Language (OWL) is the language approved by the W3C for representing knowledge in the Web
Semantic Model of the Experiment Note that every word in this diagram is, in reality, a URL (it’s a Semantic Web model) i.e. It refers to the expertise of other researchers, distributedaround the world on the Web(i.e. NanoPubs***) ***remember this word!! It will be important later!!
Set-up the Experimental Conditions In a local data-file provide the protein we are interested inand the two species we wish to use in our comparison taxon:9606 a i:OrganismOfInterest . # human uniprot:Q9UK53 a i:ProteinOfInterest . # ING1 taxon:4932 a i:ModelOrganism1 . # yeast taxon:7227 a i:ModelOrganism2 . # fly
Run the Experiment SELECT ?protein FROM <file:/local/workflow.input.n3> WHERE { ?proteinai:ProbableInteractor .}
Run the Experiment SELECT ?protein FROM <file:/local/workflow.input.n3> WHERE { ?proteinai:ProbableInteractor .} This is the URL that leads our computerto the Semantic model of the problem
SHARE examines the semantic model of Probable Interactors Retrieves third-party expertise from the WebDiscusses with SADI what analytical tools are necessaryChooses the right tools for the problem Solves the problem!
SHARE derives (and executes) the following analysis automatically
SHARE is aware of the context of the specific question being asked