570 likes | 803 Views
ChEBI. Kirill Degtyarenko, EMBL-EBI / EPO. The team. Rafael Alcántara Michael Ashburner * Volker Ast * Michael Darsow * Paula de Matos Marcus Ennis Janna Hastings Alan McNaught * Inma Spiteri Christoph Steinbeck Martin Zbinden *. ChEBI: What is it?.
E N D
ChEBI Kirill Degtyarenko, EMBL-EBI / EPO
The team • Rafael Alcántara • Michael Ashburner * • Volker Ast * • Michael Darsow * • Paula de Matos • Marcus Ennis • Janna Hastings • Alan McNaught * • Inma Spiteri • Christoph Steinbeck • Martin Zbinden *
ChEBI: What is it? Chemical Entities of Biological Interest – an EBI database/dictionary of ‘biochemical compounds’
What are the ‘biochemical compounds’? Can be defined as consisting of “molecules not directly encoded by the genome ... that are either the products of nature or are synthetic products used ... to intervene in the processes of living organisms” [Michael Ashburner]
Molecular entity “Any constitutionally or isotopically distinct atom, molecule,ion, ion pair, radical, radical ion, complex, conformer etc., identifiable as a separately distinguishable entity” [IUPAC “Gold Book”]
In fact, ChEBI contains • Molecular entities • trans-vaccenic acid • Groups • trans-vaccenoyl group • Classes • fatty acids
‘Small molecules’? Yes, but big molecules as well! • alumina • amylose • metaborate • poly(vinyl alcohol)
1-D ChEBI • Numeric ID • Carefully checked terminology • Unambiguous ChEBI name • IUPAC names • Cross-references to free resources
Unambiguous ChEBI name CHEBI:28918 L-adrenaline not just‘adrenaline’
6 5 1 4 2 3 1 6 2 5 3 4 Systematic Name (IUPAC) 2-{[3-(trifluoromethyl)phenyl]amino}benzoic acid
Common Name • flufenamic acid (INN English) • acide flufénamique (INN French) • ácido flufenámico (INN Spanish) • acidum flufenamicum (INN Latin) • Flufenaminsäure (German)
The Unpronounceables CHEBI:48935 (E)-roxithromycin IUPAC name: (3R,4S,5S,6R,7R,9R,10E,11S,12R,13S,14R)-4-(2,6-dideoxy-3-C-methyl-3-O-methyl-α-L-ribo-hexopyranosyloxy)-14-ethyl-7,12,13-trihydroxy-10-{[(2-methoxyethoxy)methoxy]imino}-6-[3,4,6-trideoxy-3-(dimethylamino)-β-D-xylo-hexopyranosyloxy]-3,5,7,9,11,13-hexamethyloxacyclotetradecan-2-one
CHEBI:48935 (E)-roxithromycin INN: roxithromycin CHEBI:32109 (Z)-roxithromycin What is the common name of roxithromycin?
CHEBI:48844 roxithromycin Roxithromycin (2) (Z)-roxithromycin (E)-roxithromycin
CHEBI:18385 thiamine(1+) aka thiamine CHEBI:33283 thiamine(1+) chloride INN: thiamine CHEBI:49105 thiamine(2+) dichloride aka thiamine chloride hydrochloride aka thiamine hydrochloride What is thiamine?
Need for 2-D • “Better to see the face than to hear the name” (Zen proverb) • Structures and identifiers based on structures offer new ways of crosslinking to other databases • Structure search
Connection table ChEBI 9 10 0 0 0 0 999 V2000 11.8219 -7.2713 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.8219 -8.0922 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 12.6074 -7.0165 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 11.1072 -6.8574 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 12.6039 -8.3505 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 11.1072 -8.5027 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 13.0886 -7.6818 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 10.3923 -7.2713 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 10.3888 -8.0922 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0 0 2 5 1 0 0 0 0 2 6 1 0 0 0 0 3 7 1 0 0 0 0 4 8 2 0 0 0 0 6 9 2 0 0 0 0 5 7 2 0 0 0 0 8 9 1 0 0 0 0 M END
2-D ChEBI • One or more 2-D (or 3-D) connection tables • One is default • Autogenerated images (PNG) • Default diagrams should be unambiguous
Fused systems (R)-camphor ambiguous unambiguous
Square planar geometry cisplatin transplatin
From 2-D back to 1-D • SMILES • InChI
SMILES (1) • Simplified Molecular Input Line Entry Specification • Developed by David Weininger in 1988 • Extended by others (e.g. Daylight) • String of standard ASCII characters • A number of valid SMILES can be produced for the same molecule
SMILES (2) • N1C=NC2=C1C=NC=N2 • c1ncc2ncnc2n1 • C=1N\C=N/C\2=N/C=N\C=1/2 • c1ncnc2/N=C\Nc12 • n1cc2c(nc1)ncn2 • [H]c1nc([H])c2n([H])c([H])nc2n1
InChI (1) • IUPAC International Chemical Identifier or InChI • Open source • Developed by Stein, Heller, Tchekhovskoi and McNaught • Used by NIST, PubChem, CML… and ChEBI
InChI (2) InChI=1/C5H4N4/c1-4-5(8-2-6-1)9-3-7-4/h1-3H,(H,6,7,8,9)/f/h7H InChIKey=KDCGOANMDULRCW-QDQILVOLCG
Limitations (1) • Stereochemistry other than sp3 tetrahedral and sp2 trigonal planar • Polymers • Conformers • Radicals/different spin state • Topological isomers • Mixtures • Markush structures
Limitations (2) cisplatin transplatin InChI=1/2ClH.2H3N.Pt/h2*1H;2*1H3;/q;;;;+2/p-2
3-D ChEBI cisplatin
Uncertainty and ambiguity in chemistry • Compositional uncertainty • Positional uncertainty • Configurational uncertainty • Conformational uncertainty
Compositional uncertainty Examples • an alkali metal cation • vanadate(V) anion • [2H]ethanol
Positional uncertainty Examples • L-bromohistidine residue • pteroic acid (several tautomers)
Configurational uncertainty Examples • androstane • rel-(2R,3R)-2-amino-3-methylpentanoic acid • tetradec-11-enoic acid
Conformational uncertainty Examples • cyclohexane: chair, boat, twist • protein secondary structure: , , …
ChEBI ontology • Molecular structure ontology • Subatomic particle ontology • Role ontology • Biological role • Application
L-adrenaline Molecular structure ontology • catecholamines Biological role • hormone Application • antiglaucoma • bronchodilator • cardiostimulant
The family relations L-cystein-S-yl L-cysteine(•) L-cysteine zwitterion cysteine D-cysteine L-cysteino L-cysteine L-cysteinium L-cysteinyl L-cysteinate(1–) L-cysteine residue L-cysteinate(2–) L-cysteinate residue
Is A relationship ∆ L-cysteine is a cysteine
∆ ∆ Is Enantiomer Of L-cysteine is enantiomer of D-cysteine
Has Part has part ⋄ L-cysteinium is part of L-cysteine hydrochloride
♯ ♯ ♯ Is Conjugate Acid Of L-cysteinium L-cysteinate(2–) L-cysteine is conjugate acid of L-cysteinate(1–)
♭ ♭ ♭ Is Conjugate Base Of L-cysteinium L-cysteinate(2–) L-cysteine L-cysteinate(1–)
♭ ♯ ♯ ♭ Acid/base relationships L-cysteinium L-cysteinate(2–) ♯ ♭ L-cysteine L-cysteinate(1–)
is tautomer of Is Tautomer Of L-cysteine L-cysteine zwitterion
Is Tautomer Of 1H-pyrrole 2H-pyrrole 3H-pyrrole
Has Parent Hydride is parent hydride of ℋ salutaridinol has parent hydride morphinan