1 / 63

WARNING: Chemistry is Dangerous

ChemSpider as a Platform for Crowd Participation in Curating Chemistry Antony Williams IDCC, Chicago, December 2010. WARNING: Chemistry is Dangerous. Di-Hydrogen Monoxide. Di-Hydrogen Monoxide. 2H. Di-Hydrogen Monoxide. 2H + 1O. Di-Hydrogen Monoxide. H2O. Di-Hydrogen Monoxide. H2O

swansona
Download Presentation

WARNING: Chemistry is Dangerous

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ChemSpider as a Platform for Crowd Participation in Curating ChemistryAntony WilliamsIDCC, Chicago, December 2010

  2. WARNING: Chemistry is Dangerous

  3. Di-Hydrogen Monoxide

  4. Di-Hydrogen Monoxide 2H

  5. Di-HydrogenMonoxide 2H + 1O

  6. Di-Hydrogen Monoxide H2O

  7. Di-Hydrogen Monoxide H2O Water

  8. It’s all on Wikipedia…

  9. Chemistry on the Internet – Not All Bad • 100s of websites hosting chemistry-related data • Chemistry information is generally “compound-based” • Chemical “structures” • Identifiers, names and synonyms • Properties • Analytical data • How to synthesize • Articles, patents, safety information • Chemistry “language and dialects”

  10. Dialects describing chemicals

  11. A Pragmatic Vision “Build a Structure Centric Community” • Integrate chemistry across the internet based on “chemical structure” • A “structure-based hub” to information and data • Let chemists contribute their own data • Allow the community to curate & annotatedata

  12. www.chemspider.com

  13. Answering Questions for Chemists • Questions a chemist might ask… • What is the melting point of n-heptanol? • What is the chemical structure of Xanax? • Chemically, what is phenolphthalein? • What are the stereocenters of cholesterol? • Where can I find publications about xylene? • What are the different trade names for Aspirin? • What is the NMR spectrum of Benzoic Acid? • What are the safety handling issues for toluene?

  14. Search for a Chemical…by name

  15. Available Information… • Linked to chemical vendors, safety data, toxicity, metabolism…

  16. Available Information….

  17. ChemSpider Today • Almost 25 million unique chemicals • Over 400 data sources • Grows daily – community and RSC depositions • Community annotation and curation • We curate, edit, change, enhance data daily

  18. Three Years of Experience • Internet-based chemistry is a mess! • Public compound databases are contaminated • The annotation/curation of data online is difficult • Most database hosts are non-responsive to feedback – “We are a host/repository of data” • Who cares?

  19. Linked Data on the Web

  20. Where is chemistry online? • Encyclopedic articles (Wikipedia) • Chemical vendor databases • Metabolic pathway databases • Property databases • Patents with chemical structures • Drug Discovery data • Scientific publications • Compound aggregators • Blogs/Wikis and Open Notebook Science

  21. What is the Structure of Vitamin K?

  22. MeSH – Medical Subject Headings • Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants, VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione).

  23. What is the Structure of Vitamin K1?

  24. What is the Structure of Vitamin K1?

  25. Chemical Abstracts“Common Chemistry” Database

  26. Wikipedia WRONG

  27. WRONG

  28. Incorrect Structures WRONG

  29. Lack of Stereochemistry WRONG

  30. Does stereochemistry matter? • Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, Softenon, Thalidomide

  31. WRONG

  32. PubChem

  33. WRONG

  34. WRONG

  35. What’s Methane?

  36. What’s Methane?

  37. What ELSE is Methane???

  38. Internet-Based Chemistry is a Mess • Algorithms can get you so far • Human curation is necessary • Only the crowds can help with big data… ChemSpider is approaching 25 million compounds

  39. Search “Vitamin H”

  40. Search “Vitamin H”

  41. “Curate” Identifiers

  42. “Curate” Identifiers

  43. “Curate” Identifiers

  44. Crowd-sourcing Chemistry Curation • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

  45. “Curate” Identifiers • General curation activities • Remove incorrect names • Correct spellings • Add multilingual names • Add alternative names • In 3 years over 1 million structure-identifier relationships have been validated – robotically and manually • 130 people have participated in validation or annotation. “Crowds” can be quite small!

  46. Crowdsourcing Works • The “crowd” has deposited data (structures, spectra, etc) and participated in data curation • Different level curators check each others work • Wikipedia is the modern primary example • Some curators are “madmen”…

  47. Crowdsourcing Works • The “crowd” has deposited data (structures, spectra, etc) and participated in data curation • Different level curators check each others work • Wikipedia is the modern primary example • Some curators are “madmen”… • The Oxford English Dictionary

  48. Vancomycin – Curate This!!!

  49. Vancomycin on ChemSpider 1 compound – 3 days

  50. Crowdsourced “Annotations” • Users can add • Descriptions/Syntheses/Commentaries • Links to articles • Spectral data • Photos • MP3 files • Videos

More Related