1 / 40

Validation of chemical data on Wikipedia

This article discusses the validation process for chemical data on Wikipedia, including recent developments and future plans for improving data accuracy. It also addresses common misconceptions about Wikipedia's reliability as a resource for chemistry information.

diehl
Download Presentation

Validation of chemical data on Wikipedia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Validation of chemical data on Wikipedia Martin A. Walker Dept. of Chemistry, SUNY Potsdam Member of the Wikipedia Chemistry Project

  2. Overview • Introduction • Raising general quality in Wikipedia • Validating chemical data in Wikipedia • Recent developments in Wikipedia Chemistry • The future? • Questions?

  3. What is Wikipedia – and what is it not? Introduction

  4. Wikipedia is… Wikipedia is not… A database A place to publish original research An authoritative resource for chemistry Written mainly by kids, or by paid professionals Free to re-use without attribution Run by a corporation • An encyclopedia • A useful resource for chemistry • Written by volunteers • Editable by anyone • Free to be copied, re-used • Free as in “no cost”

  5. Types of chemistry article WIKIPROJECT CHEMISTRY • Chemical concepts • Chemical reactions & processes • Chemists WIKIPROJECT ELEMENTS • Chemical elements WIKIPROJECT CHEMICALS • Chemical substances WIKIPROJECT PHARMACOLOGY • Pharmaceuticals WIKIPROJECT CELL & MOLECULAR BIOLOGY • Molecular biology

  6. WikiProject Chemistry

  7. General chemistry content Reactions & processes, concepts, chemists’ biographies, etc.

  8. WikiProject Chemicals • ~60 members (~20 active) • Collaborates on writing quality articles and standards for: • developing data boxes for articles • chemical naming, structure drawing • article assessment • Data validation • Collaboration with CAS Wim Van Dorst, a Dutch member of WP:Chem since March 2005.

  9. Most articles have a Chembox Chembox is designed to be machine readable and “database friendly”

  10. WikiProject Pharmacology

  11. Most articles have a Drugbox

  12. Traffic can be very high….

  13. Even for specialized topics

  14. Raising General QUALITY in Wikipedia

  15. WMF: Long term strategy Expand the “virtuous circle” Diagram by User:Randomran – Creative Commons license

  16. Article assessment – by editors

  17. Assessment guides article improvement priorities

  18. Article ratings – by users

  19. Pending changes (flagged revisions) “Articles under PC protection are open for editing, but changes will be visible to readers who are not logged in only after being checked for obvious vandalism and clear errors.”

  20. WikiTrust • Downloadable as an extension to Firefox, this adds a tab above the article:

  21. Validation of Wikipedia chemical data

  22. How I use the key terms Validation => “How I can be sure the data are correct?” Curation = fixing errors

  23. Content validation • In 2008 a data validation drive was initiated for basic chemical identifiers • Led to a collaboration with CAS, to ensure Wikipedia CAS registry nos. are correct • Now around 3500 substances have been validated against CAS Common Chemistry, as having correct name, structure & CAS RN • Other fields now being validated • Validated content indicated with a check mark

  24. CommonChemistry • Launched in April 2009 • Came about as a result of a collaboration between CAS & Wikipedia • Offered as a free service for CAS RNs for members of the public.

  25. Organized by WP:Chemicals • Moderate participation from members of WP:Pharmacology

  26. The approach to validation • Every old version (called a RevID) of an article is preserved (for all) for posterity, and can potentially serve as a permanent record of a validated version.

  27. Protecting validated fields PROBLEM: This is “the encyclopedia anyone can edit” – so anyone can change the BP of water to 200 oC. SOLUTION: A bot patrols the pages, and watches for edits to key fields. Any dubious edits are flagged with a red X (next to the data), and logged. System developed by Dirk Beetstra(Eindhoven University of Technology). It is the only such tool on Wikipedia.

  28. Validation protected by bot • If anyone tries to vandalize a validated field, this will be flagged by a bot soon afterwards. • This example received a red X 11 minutes after it was vandalized.

  29. Validated revisionIDs

  30. Checking structures • IN 2008-2010, around 3000 chemical structures were informally checked against CAS Common Chemistry • PROBLEM: Structures are loaded from an external file on Wikimedia Commons, which can be “invisibly” changed

  31. Since fall 2010 Now the bot has been modified to watch changes to the RevID of the Wikimedia Commons structure image A few hundred images now validated

  32. Drugboxes Drugboxes are patrolled by the bot, but at present WP:PHARM not active in formal validation. Most work done by Dirk Beetstra, using official lists from data sources (e.g., ChEBI).

  33. The future?

  34. Validation of melting points • Physical properties are much harder – require human validation • Collaboration beginning with JC Bradley (Drexel) & A Lang (Oral Roberts) on MPs.

  35. Supplementary data pages

  36. Supplementary data pages can host MP validation sources These pages have room to list all sources with linked refs – providing a “paper trail” to original sources

  37. Other future developments • New formats for content – books, for cellphones (Kiwix, Wikipock, Okawix) • Offline versions that use quality checks and vandalism checks– for use in schools, developing countries, etc. • More validated data fields, with “paper trails” and real-time checks • Mashups with other sites • Integration with lab instrumentation, lab notebooks, etc?

  38. Acknowledgements • Antony Williams (RSC ChemSpider) • Dirk Beetstra (Tech Univ Eindhoven) • User:Physchim62 and many other Wikipedians • JC Bradley and Andrew Lang

  39. Thank you for your attention Any questions?

More Related