* Ref: PLoS ONE 4(5): e5440 . doi:10.1371/journal.pone.0005440 Presented by:

How Large Is the Metabolome? A Critical Analysis of DataExchange Practices in Chemistry*Tobias Kind, Martin Scholz, Oliver FiehnUniversity of California Davis, Genome Center – Metabolomics, Davis, California, USA *Ref:PLoSONE 4(5): e5440. doi:10.1371/journal.pone.0005440 Presented by: Bipin Singh

Introduction

The process of building pathway and metabolite databases

Challenges • Most of the metabolites in complex biological materials like plant tissues are non annotated • Unidentified metabolites due to the lack of experimental databases and the chemical complexity • Changing nature of an organism's metabolome • Metabolites cannot be sequenced like proteins or polynucleotides

Database searched • The CAS Chemical Abstracts Database: Chemical Abstracts covers most of the chemical and patent literature since 1907. • The CRC Press Dictionary of Natural Products: Largest curated database of natural products • The PubChem database: Free database of chemical structures of small organic molecules and information on their biological activities • The Beilsteindatabase:Database of experimental data in organic, inorganic and organometallic chemistry • Dr. Duke’s Phytochemical and Ethnobotanical Database • SetupX: Study design database for metabolomic projects • TheRiceCyc database: Catalog of known and predicted biochemical compounds and pathways from rice (oryza sativa) • The KEGG database: Pathway and metabolic network database • The KNApSAcK database: Species-metabolite relationship database • The Reactomedatabase:Open-source and manually curated pathway database

It was not possible to automatically compute a single combined large knowledge repository of all small molecules of rice (oryza sativa) Major obstacles were hindering this approach: (1)Databases did not allow or did not enable batch downloads and storage of compound lists and structures. (2)Databases did not cross-reference compound identifiers (3)Did not export structures in machine-readable formats, so that analysis of overlaps of hit lists was not possible. (4) Many databases did not distinguish between metabolites that are produced by the biochemical machinery in rice from xenobioticmolecules.

Suggested solutions • Small molecule reports need to be annotated by PubChem CIDs and structures • Small molecule reports need to be public and machine readable • Small molecule reports need to disclose sample metadata and absolute concentrations • Differentiation between endogenous and exogenous metabolites

Conclusions • Present databases are not capable of comprehensively retrieving all known metabolites • Providers of (bio)chemical databases enrich their database identifiers to PubChem IDs and InChI Keysto enable cross-database queries • Peer-reviewed journal repositories need to mandate submission of structures and spectra in machine readable format • Additional database annotation to differentiate xenobiotic molecules

* Ref: PLoS ONE 4(5): e5440 . doi:10.1371/journal.pone.0005440 Presented by: