630 likes | 648 Views
Learn how ChemSpider is changing the landscape of chemistry curation by integrating chemical structures, allowing chemists to contribute and curate data online. Discover the benefits of crowd-sourced chemistry curation and access valuable chemical information. Explore the potential of human curation in handling vast amounts of chemical data.
E N D
ChemSpider as a Platform for Crowd Participation in Curating ChemistryAntony WilliamsIDCC, Chicago, December 2010
Di-HydrogenMonoxide 2H + 1O
Di-Hydrogen Monoxide H2O Water
Chemistry on the Internet – Not All Bad • 100s of websites hosting chemistry-related data • Chemistry information is generally “compound-based” • Chemical “structures” • Identifiers, names and synonyms • Properties • Analytical data • How to synthesize • Articles, patents, safety information • Chemistry “language and dialects”
A Pragmatic Vision “Build a Structure Centric Community” • Integrate chemistry across the internet based on “chemical structure” • A “structure-based hub” to information and data • Let chemists contribute their own data • Allow the community to curate & annotatedata
Answering Questions for Chemists • Questions a chemist might ask… • What is the melting point of n-heptanol? • What is the chemical structure of Xanax? • Chemically, what is phenolphthalein? • What are the stereocenters of cholesterol? • Where can I find publications about xylene? • What are the different trade names for Aspirin? • What is the NMR spectrum of Benzoic Acid? • What are the safety handling issues for toluene?
Available Information… • Linked to chemical vendors, safety data, toxicity, metabolism…
ChemSpider Today • Almost 25 million unique chemicals • Over 400 data sources • Grows daily – community and RSC depositions • Community annotation and curation • We curate, edit, change, enhance data daily
Three Years of Experience • Internet-based chemistry is a mess! • Public compound databases are contaminated • The annotation/curation of data online is difficult • Most database hosts are non-responsive to feedback – “We are a host/repository of data” • Who cares?
Where is chemistry online? • Encyclopedic articles (Wikipedia) • Chemical vendor databases • Metabolic pathway databases • Property databases • Patents with chemical structures • Drug Discovery data • Scientific publications • Compound aggregators • Blogs/Wikis and Open Notebook Science
MeSH – Medical Subject Headings • Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants, VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione).
Wikipedia WRONG
Incorrect Structures WRONG
Lack of Stereochemistry WRONG
Does stereochemistry matter? • Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, Softenon, Thalidomide
Internet-Based Chemistry is a Mess • Algorithms can get you so far • Human curation is necessary • Only the crowds can help with big data… ChemSpider is approaching 25 million compounds
Crowd-sourcing Chemistry Curation • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
“Curate” Identifiers • General curation activities • Remove incorrect names • Correct spellings • Add multilingual names • Add alternative names • In 3 years over 1 million structure-identifier relationships have been validated – robotically and manually • 130 people have participated in validation or annotation. “Crowds” can be quite small!
Crowdsourcing Works • The “crowd” has deposited data (structures, spectra, etc) and participated in data curation • Different level curators check each others work • Wikipedia is the modern primary example • Some curators are “madmen”…
Crowdsourcing Works • The “crowd” has deposited data (structures, spectra, etc) and participated in data curation • Different level curators check each others work • Wikipedia is the modern primary example • Some curators are “madmen”… • The Oxford English Dictionary
Crowdsourced “Annotations” • Users can add • Descriptions/Syntheses/Commentaries • Links to articles • Spectral data • Photos • MP3 files • Videos