330 likes | 346 Views
A Perspective. Paul Price Dow Chemical Company pprice@dow.com. Publications are changing. Leather-bound journals and dedicated libraries, the format of the scientific paper, weird abbreviations ( Tox . & App. Pharm.) Recent email on the need for packing materials
E N D
A Perspective Paul Price Dow Chemical Company pprice@dow.com
Publications are changing • Leather-bound journals and dedicated libraries, the format of the scientific paper, weird abbreviations (Tox. & App. Pharm.) • Recent email on the need for packing materials • Dump the filing cabinets - PDF/HTML replaces paper (free color!) • Paper journals are evolving into curated web sites • Upsetting the status quo – • No technical reason for not sharing detailed technical findings
Sharing data • Ethical issues for not sharing • Privacy of individuals • Economic reasons for not sharing • Intellectual property rights • Charging for access: the economics of journals and data owners • Academics: My career depends on mining my data on my schedule • Internet-based expectations • I expect to see everything from home using my web browser
Social contracts • Permission to sell is contingent on demonstrating safety • Credence for findings is less contingent on peer review and more contingent on sharing relevant data • Science that supports regulatory decisions needs to be in the sunlight
Parting thought • When I share data I am asking the world “can someone do a better job then me in understanding the data?” • When I withhold data I am saying “no one can do a better job then me in understanding the data” Therefore journals should require the sharing of raw data as a condition or publication
Data Access:Issues and Opportunities Alan F. KarrNational Institute of Statistical Scienceskarr@niss.org February 13, 2012
Points for Discussion • The problem is hard • Players are responding rationally to incentives • Not “one size fits all” • “The data” is ill-defined • “Availability” is vague: what about • Cost • Liability • Tech support • Co-authorship • Data subjects • Reproducibility (data + code) vs. replicability (data only?) • There are effective mechanisms for access, based on statistical disclosure limitation
Should Journals Require the Release of Supporting Data as a Condition of Publication? Jane C. Schroeder, DVM PhD Science Editor, Environmental Health Perspectives schroederjc@niehs.nih.gov
Why is access to raw data desirable? • To advance scientific knowledge Is it a given that access to raw data will advance knowledge?
How would access advance knowledge? 1. Identify unintentional errors • Data entry errors, transcribing, labeling • Errors in coding, misconstrued variables • Copy editing errors • Some can be identified by a careful review of reported results • Avoid via documentation, data management, internal review • Some would require truly raw data
How would access advance knowledge? 2. Identify scientific misconduct • If the perpetrator is competent, unlikely to be evident • If not competent, likely to be multiple cues • Plagiarism, inconsistent logic, incredible findings • If access to raw data is the only way to prevent fraud, we are in trouble
How would access advance knowledge? 3. Identify “errors” in decision-making • Such “errors” may represent legitimate differences • There is no single “best way” to analyze data • However, decision-making should be completely transparent
How would access advance knowledge? 4. Reduce the time from data collection to full dissemination • Investigators must be able to recoup their investment of time and effort • Loose jobs no data for anyone • Confidentiality, informed consent agreements
What should journals do? • Careful & detailed reviews, including requests for code, data when appropriate • Require complete methods • Rationale/criteria for decisions • Information on data management, QA/QC • Require information to assess study quality • Missing data, participation, drop-out, numbers of observations
What should journals do? • Require full reporting of all results used to support key analytic decisions and conclusions • Essential when interpretation is subjective or criteria are not widely accepted • Null findings as well as positive ones • Sensitivity analyses of assumptions, alternate approaches • Supplemental material, external archiving • Review and update policies when it is in the best interest of science communication to do so
What should the community do? Discipline-appropriate standards for data management, QA/QC, and reporting Bona fide internal reviews before publication Support for costs of data sharing Encourage and reward analyses of combined data from multiple studies Avoid regulations that may ultimately impede scientific advancement by serving some members of the community at the expense of others
Introducing the Dryad Digital Repository Society of Toxicology webinar February 2013 Peggy Schaeffer 20
Many journals require data sharing upon request • Psychology • Requested data from 141 articles • “6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” data was obtained from 27% of articles. • Wicherts et al. (2006). Am. Psych. 61:726-728 • Genetics • 47% of respondents denied a request for data or materials w/in 3 yrs • 28% unable confirm others’ published research as a result. • #1 reason for data withholding (80%): effort required to share it. • Campbell et al. (2002) JAMA (4):473-80. datadryad.org
Data archiving has many benefits Modified from Beagrie et al. (2009) Keeping Research Data Safe 2 datadryad.org
Joint Data Archiving Policy [Journal] requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as [list of approved archives here]. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species. datadryad.org
Why use Dryad rather than Supplementary Online Materials? * A few publisher SOM sites are exceptions to the general rule. ** Practices differ among publishers, see Smit (2011), doi:10.1045/january2011-smit datadryad.org
Researchers are using Dryad for data archiving… As of 7 Feb-2013, Dryad contains 7306 data files associated with 2662 publications from 191 different journals datadryad.org
and using the data for research… datadryad.org
Over 25 integrated journals .. and 20 more on the way datadryad.org
Trustworthy repository infrastructure • Making data available is the primary mission of the organization • No pay-walls or restrictive licenses (all released under CCZero) • The same data may be hosted by other services (non-exclusivity) • Built on the DSpace repository platform • An open source framework used by hundreds of institutional repositories • Multiple machine and human interfaces for discovery and access • Dublin Core metadata harvestable through OAI-PMH • DOIs registered through DataCite • Curation-enhanced metadata to enhance keyword searching • Indexed by Web of Science and other bibliographic services • Assurance of data integrity and permanent availability • Service mirroring and backup • File migration and bit-level integrity assurance • Organizational failover through DataONE and (soon) CLOCKSS datadryad.org
Governance • Not-for-profit organization • Incorporated in North Carolina (USA) • Membership isopen to a diversity of stakeholder organizations • Scientific societies, publishers, funding agencies, universities, libraries, etc. • Members need not publish a partner journal • Governed by a rotating 12-member Board of Directors, nominated and elected by the membership datadryad.org
Sustainability • Long-term preservation requires an organization with a viable business model • Not dependent on the vagaries of grant funding • Or the largesse of an institution that may have other priorities • Revenue will be primarily from deposit fees • This enables Dryad to make access to the data free in perpetuity • The time of deposit is when the majority of costs are incurred • Revenue scales with costs (i.e. volume of deposits) • The costs are distributed both fairly and widely • Additional revenue • Membership fees ($1000/yr) will cover costs of annual Membership meetings • Project grants will supplement the operational budget for R&D activities • With research and development activities funded by grants at various institutions (e.g. Duke University, Univ. of North Carolina at Chapel Hill) datadryad.org
Payment plans 1 Up to a fixed deposit size (currently 10GB). Additional charges for larger deposits. 2 Data package = all the data associated with an article. datadryad.org
The value proposition • For researchers, Dryad… • increases the impact of, and citations to, published research • preserves and makes available others’ data • frees researchers from the burden of data preservation and access • For societies, journals, and publishers Dryad… • offers more visibility for research outputs • promotes prestige for the discipline • supports a wide range of journal policies on data sharing • frees journals from the burden of maintaining supplemental data • For libraries and institutions, Dryad… • makes data available at no cost, under clear terms of use • helps fulfill their research data management mandates • For funders, Dryad… • provides a cost-effective mechanism to make research more accessible datadryad.org
To learn more • Repository home: http://datadryad.org • News: http://blog.datadryad.org • Project documentation: http://wiki.datadryad.org • Twitter: @datadryad • Facebook: www.facebook.com/DataDryad contact us: • Todd Vision, Project Director, tjv@bio.unc.edu • Laura Wendell, Executive Director, lwendell@datadryad.org • Peggy Schaeffer, Communications Coordinator, pschaeffer@datadryad.org datadryad.org