400 likes | 418 Views
This article discusses the validation process for chemical data on Wikipedia, including recent developments and future plans for improving data accuracy. It also addresses common misconceptions about Wikipedia's reliability as a resource for chemistry information.
E N D
Validation of chemical data on Wikipedia Martin A. Walker Dept. of Chemistry, SUNY Potsdam Member of the Wikipedia Chemistry Project
Overview • Introduction • Raising general quality in Wikipedia • Validating chemical data in Wikipedia • Recent developments in Wikipedia Chemistry • The future? • Questions?
What is Wikipedia – and what is it not? Introduction
Wikipedia is… Wikipedia is not… A database A place to publish original research An authoritative resource for chemistry Written mainly by kids, or by paid professionals Free to re-use without attribution Run by a corporation • An encyclopedia • A useful resource for chemistry • Written by volunteers • Editable by anyone • Free to be copied, re-used • Free as in “no cost”
Types of chemistry article WIKIPROJECT CHEMISTRY • Chemical concepts • Chemical reactions & processes • Chemists WIKIPROJECT ELEMENTS • Chemical elements WIKIPROJECT CHEMICALS • Chemical substances WIKIPROJECT PHARMACOLOGY • Pharmaceuticals WIKIPROJECT CELL & MOLECULAR BIOLOGY • Molecular biology
General chemistry content Reactions & processes, concepts, chemists’ biographies, etc.
WikiProject Chemicals • ~60 members (~20 active) • Collaborates on writing quality articles and standards for: • developing data boxes for articles • chemical naming, structure drawing • article assessment • Data validation • Collaboration with CAS Wim Van Dorst, a Dutch member of WP:Chem since March 2005.
Most articles have a Chembox Chembox is designed to be machine readable and “database friendly”
WMF: Long term strategy Expand the “virtuous circle” Diagram by User:Randomran – Creative Commons license
Pending changes (flagged revisions) “Articles under PC protection are open for editing, but changes will be visible to readers who are not logged in only after being checked for obvious vandalism and clear errors.”
WikiTrust • Downloadable as an extension to Firefox, this adds a tab above the article:
How I use the key terms Validation => “How I can be sure the data are correct?” Curation = fixing errors
Content validation • In 2008 a data validation drive was initiated for basic chemical identifiers • Led to a collaboration with CAS, to ensure Wikipedia CAS registry nos. are correct • Now around 3500 substances have been validated against CAS Common Chemistry, as having correct name, structure & CAS RN • Other fields now being validated • Validated content indicated with a check mark
CommonChemistry • Launched in April 2009 • Came about as a result of a collaboration between CAS & Wikipedia • Offered as a free service for CAS RNs for members of the public.
Organized by WP:Chemicals • Moderate participation from members of WP:Pharmacology
The approach to validation • Every old version (called a RevID) of an article is preserved (for all) for posterity, and can potentially serve as a permanent record of a validated version.
Protecting validated fields PROBLEM: This is “the encyclopedia anyone can edit” – so anyone can change the BP of water to 200 oC. SOLUTION: A bot patrols the pages, and watches for edits to key fields. Any dubious edits are flagged with a red X (next to the data), and logged. System developed by Dirk Beetstra(Eindhoven University of Technology). It is the only such tool on Wikipedia.
Validation protected by bot • If anyone tries to vandalize a validated field, this will be flagged by a bot soon afterwards. • This example received a red X 11 minutes after it was vandalized.
Checking structures • IN 2008-2010, around 3000 chemical structures were informally checked against CAS Common Chemistry • PROBLEM: Structures are loaded from an external file on Wikimedia Commons, which can be “invisibly” changed
Since fall 2010 Now the bot has been modified to watch changes to the RevID of the Wikimedia Commons structure image A few hundred images now validated
Drugboxes Drugboxes are patrolled by the bot, but at present WP:PHARM not active in formal validation. Most work done by Dirk Beetstra, using official lists from data sources (e.g., ChEBI).
Validation of melting points • Physical properties are much harder – require human validation • Collaboration beginning with JC Bradley (Drexel) & A Lang (Oral Roberts) on MPs.
Supplementary data pages can host MP validation sources These pages have room to list all sources with linked refs – providing a “paper trail” to original sources
Other future developments • New formats for content – books, for cellphones (Kiwix, Wikipock, Okawix) • Offline versions that use quality checks and vandalism checks– for use in schools, developing countries, etc. • More validated data fields, with “paper trails” and real-time checks • Mashups with other sites • Integration with lab instrumentation, lab notebooks, etc?
Acknowledgements • Antony Williams (RSC ChemSpider) • Dirk Beetstra (Tech Univ Eindhoven) • User:Physchim62 and many other Wikipedians • JC Bradley and Andrew Lang
Thank you for your attention Any questions?