Halil Kılıçoğlu , PhD Moderator U.S. National Library of Medicine

Methods and Tools to Enhance Rigor and Reproducibility of Biomedical Research S88: Panel Halil Kılıçoğlu, PhD Moderator U.S. National Library of Medicine

Disclosure • I and my partner have no relevant relationships with commercial interests to disclose. AMIA 2018 | amia.org

Learning Objectives • After participating in this session the learner should be better able to: • Articulate the causes of rigor and reproducibility problems and their consequences on biomedical research • Appreciate the ongoing efforts in standardization and guideline development to enhance rigor and reproducibility • Understand the ways in which informatics-based tools and methods can complement standards and guidelines AMIA 2018 | amia.org

AMIA 2018 | amia.org

“Reproducibility Crisis” • Causes • Poor experimental design and oversight • Publication bias for positive, statistically significant results • Novelty over reproducibility • “Publish or perish” culture • Stakeholders • Scientists, journals, reviewers, academic institutions, funding agencies, policymakers • Some uncertainty inherent in empirical science • However, failures are too frequent AMIA 2018 | amia.org

Enhancing Rigor and Reproducibility • NIH Rigor and Reproducibility Guidelines • ICMJE recommendations for the conduct and publication of scholarly work • EQUATOR Network [Altman et al., 2008] • Reporting guidelines for health research • TOP Guidelines [Nosek et al., 2015] • Transparency standards for data, code, citation, etc. • FAIR Principles [Wilkinson et al., 2016] • Data sharing and stewardship • Conferences (WCRI), journals (RIPR), centers (COS, METRIC) AMIA 2018 | amia.org

Disentangling the Terminology AMIA 2018 | amia.org

Informatics for Rigor and Reproducibility • Tools and resources to assist the stakeholders in biomedical research • Scrutinize and reproduce conducted research more systematically • Manage published research more efficiently to design rigorous and reproducible studies • Complement standardization and guideline development efforts • Infrastructure for sharing and replication • DataMed [Chen et al., 2018], MIMIC Code Repository [Johnson et al., 2018] • NLP/TM to identify rigorous studies, extract study characteristics • ExaCT [Kiritchenko et al., 2010], RobotReviewer [Marshall et al., 2015] • Semantic models to support replication and verification • ProvCaRe [Valdez et al., 2017], Micropublications [Clark et al., 2014] AMIA 2018 | amia.org

Panelists • Aurélie Névéol, PhD, UniversitéParis Saclay • MIROR project (“Methods in Research on Research”) and CLEF eHealth initiatives • Tim Clark, PhD, University of Virginia • FAIR Data and Software Citation • Hua Xu, PhD, University of Texas Health Science Center • DataMed • Neil R. Smalheiser, MD, PhD, University of Illinois at Chicago • Retractions and Reproducibility AMIA 2018 | amia.org

Findings from the “Methods in Research on Research” and CLEF eHealth initiatives Aurélie Névéol, PhD LIMSI, CNRS, Université Paris Saclay

Disclosure • I and my spouse/partner have no relevant relationships with commercial interests to disclose. AMIA 2018 | amia.org

Explaining the lack of reproducibility Hypothesis: no malicious intent • Research material is often unavailable • Medical corpora and other data due to confidentiality • Software due to commercial strategy • Seemingly insignificant details are left out of protocols • Reporting bias • Lack of hindsight • Space limitation in papers • Novelty is valued more than reproducibility AMIA 2018 | amia.org

Learning from reproducibility(or lack thereof) The tale of the Zigglebottom tagger Variability lies in… • Pre-processing (what is being pre-processed?) • Tokenization • Stop-word lists • “Data cleaning”, e.g. normalization of case, diacritics • Software versions, system variations, e.g. ties, random seeds • Parameters, including training/test split Pedersen T. 2008. Empiricism is not a matter of faith. Computational Linguistics:34(3):465-470 Fokkens A, Van Erp M, Postma M, Pedersen T, Vossen P, Freire N. 2013. Offspring from Reproduction Problems: What Replication Failure Teaches Us. Proc ACL: 1691-1701

Variability on corpus: GRACE Counting « words » Counting « sentences » 5

Improving reproducibility • Raising community awareness • This panel, Pedersen CL 2008, Cohen et al. AMIA 2017, … • Survey at http://qcm.paris-sorbonne.fr/index.php?sid=84947&lang=en • Research material is often unavailable • Shared tasks • Shared datasets fostering reproducibility (e.g., Norman et al. S39) • Reporting bias • Reporting guidelines AMIA 2018 | amia.org

The Shared Task Model Primary goal is to provide a forum for direct comparison of approaches • Availability of shared material • Specific definition of a “task” • Corpora and annotations, split into training, development and test sets • Evaluation metrics and scripts • “Working Notes” papers describing participants’ approaches 7

Reproducing shared task results Reproducibility track at • An automatic coding task • 4 analysts aim to reproduce participants runs, and baseline • Hypothesis: analysts use their usual work environment (vs. controlled environment) Overall, resultscanbereproduced, but… • Replication is not easy, even for a baseline method! • No single analyst was able to replicate every run • Documentation shortcomings reported Névéol A, Cohen KB, Grouin C, Robert A. Replicability of Research in Biomedical Natural Language Processing: a pilot evaluation for a coding task. Proc. of the Seventh International Workshop on Health Text Mining and Information Analysis, LOUHI. 2016 8

Levels of reproducibility Reproducibility of a value • Some experiments are not deterministic, e.g. using neural models Image source: Tourille et al. LOUHI 2018 Cohen KB, Xia JB, Zweigenbaum P, Callahan T, Hargraves O, Goss F, Ide N, Névéol A, Grouin C, Hunter LE. Three Dimensions of Reproducibility in Natural LanguageProcessing. LanguageResources and Evaluation Conference, LREC 2018. 2018:156-165. 9

Levels of reproducibility Reproducibility of a value • Some experiments are not deterministic, e.g. using neural models Reproducibility of a finding • Different values obtained during iterations of an experiment may lead to the same finding, e.g; A>B Reproducibility of a conclusion • Conclusions are inferred from findings, thus subject to interpretation Cohen KB, Xia JB, Zweigenbaum P, Callahan T, Hargraves O, Goss F, Ide N, Névéol A, Grouin C, Hunter LE. Three Dimensions of Reproducibility in Natural LanguageProcessing. LanguageResources and Evaluation Conference, LREC 2018. 2018:156-165. 10

The PRIMAD model:which attributes can we “prime”? Defining Types of Reproducibility • Platform • Research Objective • Implementation • Method • Actors • Data • Parameters • Input data What do we gain by priming one or the other? Juliana Freire, Norbert Fuhr, and Andreas Rauber. Reproducibility of Data-Oriented Experiments in eScience. DagstuhlReports, 6(1), 2016. 11

Types of Reproducibility and Gains 12

Use of Reporting Guidelines in Health Research David Blanco: Assessing interventions to improve reporting guidelines adherence Reporting Guidelines are recent tools • Majority have not been assessed for efficiency of reporting improvement • CONSORT have been shown to improve completeness of reporting • A systematic review reports that overall adherence to guidelines is suboptimal Impact of Reporting Guidelines • Before/ After conducting a study • Training, Understanding, Implementing, Monitoring, Collaborating Blanco D, Kirkham JJ, Altman DG, Moher D, Boutron I, Cobo E. Interventions to improve adherence to reporting guidelines in health research: a scoping review protocol. BMJ Open. 2017 Nov 16;7(11):e017551. 13

Natural Language Processing and Reporting Guidelines NLP canfacilitateadherencetoreportingguidelines • Automatically assess guideline compliance • Match guideline item with implementation in manuscript Guidelines forreporting (bio)NLP research? • Study of 29 articles in proceedingsofBioNLP 2016 • 48% ofpapersprovidedpointerstodata, 61% providedpointerstocode, 21% providedpointerstoboth • Inter-rater agreement was .57 fordata, .63 forcode. Cohen KB, Névéol A, Xia J, Hailu N, Hunter L ,Zweigenbaum P. Reproducibility in Biomedical Natural LanguageProcessing. Proc AMIA AnnuSymp. 2017. 14

Take Home Message: Reproducibility is hard to achieve! Aim at achievingreproducibility • Re-run, ask others to re-run • (Re-implement, port to different platforms) • Test on different data, vary parameters (and report!) If something is not reproducible -> investigate! (you might be onto something) Aim for better procedures and documentation • Plan your research procedure: design a protocol, a data management plan • Document, document, document: the research process, environment, interim results, … Working reproducibly is good for science… and good for you! Markowetz F. Five selfishreasons to workreproducibly. GenomeBiol. 2015 Dec 8;16:274. 15

Thank you! Email me at: neveol@limsi.fr Horizon 2020 research and innovation programme: Marie Sklodowska-Curie grant agreement No 676207 CABeRneT ANR-13-JS02-0009-01 CLEF initiative

Tim Clark University of Virginia School of Medicine & Data Science Institute AMIA Annual Symposium San Francisco, November 3-7, 2018 FAIR Data and Software Citation for Protected Health Information

Summary • FAIR Data • Levels of FAIRness • Data citation → FAIR data • FAIR Protected Health Information • Dataset Search • Software citation

OPEN FREE FAIR cc MACHINE READABLE & DISCOVERABLE adapted from AOASG 2017 https://aoasg.org.au/response-to-innovation-and-science-australias-2030-strategic-plan-issues-paper/

Data citation → FAIR data • Data citation has been widely endorsed and is growing in acceptance. • Archive & cite your FAIR data to justify your claims. No more “data available upon request from author”. • Creates FAIR data at the point of its production. • Endorsed by 119 academic & funding organizations.

Data citation principles TL;DR • Treat data as a first-class scholarly object. • Archive and cite data to justify and validate scientific claims. • If a claim relies directly on your interpretation of a dataset - e.g. it’s your own data; or when reusing data - cite the archived data. • If you rely primarily on another author’s published interpretations, cite the article, as you do currently. • Data citations must include globally unique, machine actionable persistent identifiers accepted by your scholarly community. • E.g. Datacite DOIs, Compact Identifiers, … but not just plain URLs. summarized and extracted from Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 [https://www.force11.org/group/joint-declaration-data-citation-principles-final].

Data citation resolution path Data citation resolution structure (ideal workflow). Articles (1) link to datasets in appropriate repositories, on which their conclusions are based, through citation to a dataset (a), whose unique persistent identifier (PID) resolves (b) to a landing page (2) in a well-supported data repository. The data landing page contains human- and machine-readable metadata, to support search and to resolve (c) back to the citing article, and (d) a link to the data itself (3). fromCousijn et al. A data citation roadmap for scientific publishers. Scientific Data (accepted)

Protected Health Information • Protected Health Information (PHI) analysis essential for doing Precision Medicine. • Problem: We want to know about PHI datasets even if we don’t have yet permission to access them (at this time). • Similarly to how publications work behind paywalls… where Data Use Agreement analogous to a journal subscription. • Get DUA in place once you determine need for data.

Increasing Levels of Fair Data adapted from Mons et al (2017). Information Services & Use 37(1), pp. 49–56. DOI 10.3233/ISU-170824

Access Control FAIR data use case for Protected Health Information PID PID Metadata Metadata Provenance Data

Globally unique on the Web Linked to “landing page” & object endpoints by a resolver e.g. http://doi.org, http://n2t.net e.g. http://identifiers.org Machine & human readable metadata PID Persistent Identifiers

Description - author, publisher, date, version, size Format - content negotiation Verification - checksum Object location - URL(s) Context - in data catalog Extended descriptive term set Machine & human readable Metadata Object Metadata Resolving PIDs to landing pages separate from the data allows indexing, access control, multi-cloud resolution, and indefinite persistence of object descriptions.

schema.org metadata for a TOPMed dataset • { • {"@context": "http://schema.org", • "@id": "https://doi.org/10.23725/1n7m-1e24", • "@type": "Dataset", • "additionalType": "CRAM file", • “author": {"name": "TOPMed"}, • "datePublished": "2017-11-30", • “description": "TOPMed: NWD245901.b38.irc.v1.cram \u003cbr\u003eFile: CRAM file", • "funding": { • “@id": "https://doi.org/10.13039/100000050", • "@type": "Organization", • "name": "National Heart, Lung, and Blood Institute (NHLBI)” • }, • “identifier": [ • {"@type": "PropertyValue", "propertyID": "doi", "value": "https://doi.org/10.23725/1n7m-1e24"}, • {"@type": "PropertyValue", "propertyID": "dataguid", "value": "dg.4503/0a11bd96-6a23-4d6a-901e-c83c868b213e"}, • {"@type": "PropertyValue", "propertyID": "md5", "value": "0e3560f6c789bc6704f134c57f7adc23"}], • "keywords": "topmed, whole genome sequencing", • "name": "NWD245901.b38.irc.v1.cram", • "publisher": { • "@type": "Organization", "name": “TOPMed" • }, • "schemaVersion": "http://datacite.org/schema/kernel-4"} • } • }

TOPMed CRAM file

Software citation • To validate, re-use, or adapt the datasets underlying scientific assertions, we need access to both • the Data, and • the Software used to analyze it.

NIH Data Commons • Pilot project for cross-NIH interoperable cloud computing. • Phase 1 proof-of-concept just completed - supports data citation methodology with multiple identifier types. • Phase 2 plans include support for software citation.

Merce Crosas Paolo Ciccarese Sudeshna Das Martin Fenner Julian Gauteri Carole Goble Brad Hyman Max Levinson Acknowledgements • Merce Crosas • Patricia Cruse • Martin Fenner • Julian Gauteri • Richard Hallett • Rafael Jimenez • Greg Janée • Nick Juty • John Kunze • Max Levinson • Manuel Bernal Llinares • Sarala Wimalaratne

Halil Kılıçoğlu , PhD Moderator U.S. National Library of Medicine