730 likes | 894 Views
ChEBI: an EBI chemistry reference. Overview. Introduction to ChEBI Searching and browsing Understanding the ontology Downloads and programmatic access. Introduction to ChEBI. Block 1. Small Molecules within Bioinformatics. Genomes. Literature. Expressions. Nucleotide sequences.
E N D
Overview • Introduction to ChEBI • Searching and browsing • Understanding the ontology • Downloads and programmatic access ChEBI – Chemical Entities of Biological Interest
Introduction to ChEBI Block 1
Small Molecules within Bioinformatics Genomes Literature Expressions Nucleotide sequences Protein sequences Protein domains, families Enzymes 3D structures Small molecules Pathways Systems
Small Molecules within Bioinformatics Genomes Literature Expressions Nucleotide sequences Protein sequences Protein domains, families Enzymes 3D structures Small molecules Small molecules Small molecules Small molecules Small molecules Small molecules Pathways Systems
γ-aminobutyric acid Signaling • GABA: chief inhibitory neurotransmitter in the mammalian central nervous system. • In humans, also regulates muscle tone. • synthesized by neurons • found mostly as a zwitterion, that is, with the carboxyl group deprotonated and the amino group protonated • conformational flexibility of GABA is important for its biological function, as it has been found to bind to different receptors with different conformations • GABA deficiency linked to • anxiety disorder, depression, alcoholism • multiple sclerosis, action tremors, tardive dyskinesia
Adenosine 5'-triphosphate Metabolism Adenosine 5’-triphosphate (ATP): the "molecular unit of currency" of intracellular energy transfer. • generated in the cell by energy-consuming processes, broken down by energy-releasing processes • proteins that bind ATP do so in a characteristic protein fold known as the Rossmann fold, which is a general nucleotide-binding structural domain that can also bind the cofactor NAD
Enzymes • Enzyme inhibitors are molecules that bind to enzymes and decrease their activity. • Many drugs are enzyme inhibitors. They are also used as herbicides and pesticides. • Enzyme activators bind to enzymes and increase their enzymatic activity. • Enzyme activators are often involved in the allosteric regulation of enzymes in the control of metabolism. clavulanic acid acts as a suicide inhibitor of bacterial β-lactamase enzymes
Pathways http://www.genome.jp/kegg-bin/highlight_pathway?scale=1.0&map=map00231&keyword=tryptophan
Systems biology BioModels: quantitative models of biochemical and cellular systems tryptophan D-enantiomer: sweet L-enantiomer: bitter
Drug design • Ligand-based: relies on knowledge of other molecules that bind to the biological target of interest. • Structure-based: relies on knowledge of the 3D structure of the biological target. • A lead has • evidence that modulation of the target will have therapeutic value: e.g. disease linkage studies showing associations between mutations in the biological target and certain disease states. • evidence that the target is druggable, i.e. capable of binding to a small molecule and that its activity can be modulated by the small molecule. • Target is cloned and expressed, then libraries of potential drug compounds are screened using screening assays
Drug types 2003 - 2009 'Small molecules' in various shades of blue (http://chembl.blogspot.com/)
Small molecule annotations Often appear as free text in biological databases, in which they are not the core data Are frequently referred to by common names which may be chemically ambiguous eg. adrenaline = (S)-adrenaline ? (R)-adrenaline ? • May be referred to by several different names • paracetamol, acetaminophen, 4-acetamidophenol, N-(4-hydroxyphenyl)acetamide, …
Getting the chemistry right • Thalidomide a non-barbiturate hypnotic • Thalidomide displays immunosuppresive and anti-angiogenic activity. It inhibits release of tumor necrosis factor-alpha from monocytes, and modulates other cytokine action. • Thalidomide is racemic — it contains both left and right handed isomers in equal amounts: one enantiomer is effective against morning sickness, and the other is teratogenic. • Enantiomers are interconverted in vivo. That is, if a human is given D-thalidomide or L-thalidomide, both isomers can be found in the serum. Hence, administering only one enantiomer does not prevent the teratogenic effect in humans. http://www.drugbank.ca/drugs/DB01041
Small molecule data sources http://pubchem.ncbi.nlm.nih.gov/ Deposition-driven publicly available compound repository, containing more than 25 million unique structures. http://www.chemspider.com/ Automatic aggregation of publicly available chemistry data with crowdsourced annotation. Small molecules and bioactivity http://www.ebi.ac.uk/chembldb/ http://www.ebi.ac.uk/chebi/ Manually annotated database and ontology
Chemicals - ChEBI Nomenclature Ontology metaboliteCNS stimulanttrimethylxanthines caffeine1,3,7-trimethylxanthine methyltheobromine Chemical data Database Xrefs Formula: C8H10N4O2Charge: 0 Mass: 194.19 MSDchem: CFFKEGG DRUG: D00528 Chemical Informatics Visualisation InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 SMILES CN1C(=O)N(C)c2ncn(C)c2C1=O
What is ChEBI? Chemical Entities of Biological Interest Freely available Focused on ‘small’ chemical entities (no proteins or nucleic acids) Illustrated dictionary of chemical nomenclature High quality, manually annotated Provides chemical ontology Access ChEBI at http://www.ebi.ac.uk/chebi/ ChEBI – Chemical Entities of Biological Interest
ChEBI home page http://www.ebi.ac.uk/chebi ChEBI – Chemical Entities of Biological Interest
How is ChEBI maintained? • Automatic loading of preliminary data • Automatic loading of 2 star annotated data • Manual annotation • User requests via Submission Tool • Public release: First Wednesday of every month. ChEBI – Chemical Entities of Biological Interest
ChEBI entries contain • A unique, unambiguous,recommended ChEBI name and an associated stable unique identifier • An illustration where appropriate (compounds and groups, but generally not classes) • A definition where appropriate (mostly classes) • A collection of synonyms, including the IUPAC recommended name for the entity where appropriate • A collection of cross-references to other databases • Links to the ChEBI ontology ChEBI – Chemical Entities of Biological Interest
ChEBI entry view ChEBI – Chemical Entities of Biological Interest ChEBI – Chemical Entities of Biological Interest
Automatic Cross-references ChEBI – Chemical Entities of Biological Interest
Chemical Structures • Chemical structure may be interactively exploredusing MarvinView applet • Available in formats • Image • Molfile • InChI and InChIKey • SMILES ChEBI – Chemical Entities of Biological Interest
Molfile format ChEBI – Chemical Entities of Biological Interest
Searching and browsing ChEBI Block 2
Simple text search Wildcard: * Enter any text ChEBI – Chemical Entities of Biological Interest
Simple Text Search ChEBI – Chemical Entities of Biological Interest
Advanced Search ChEBI – Chemical Entities of Biological Interest
Advanced text search Narrow to category AND, OR and BUT NOT ChEBI – Chemical Entities of Biological Interest
Structure search Structure drawing tools Search options ChEBI – Chemical Entities of Biological Interest
Search Results Download your search results Hover-over for zoomed in image Click to go to entry page ChEBI – Chemical Entities of Biological Interest
Fingerprints • Chemical substructure searching is computationally expensive… ChEBI – Chemical Entities of Biological Interest
Fingerprints [2] • … so heuristics must be used to decrease the number of search candidates cannot be a substructure of an entity which does not have at least 8 carbon atoms, 9 hydrogen atoms… C8H9NO2 • Fingerprints are a generalized, abstract encoding of structural features which can be used as an effective screening device ChEBI – Chemical Entities of Biological Interest
Fingerprints [3] • Encoding of structural patterns water (HOH) 0-bond paths H O H 1-bond paths HO OH 2-bond paths HOH • Hashed to create bit strings, which are added together to give final fingerprint ChEBI – Chemical Entities of Biological Interest
Types of structure search • Identity – based on InChI • Substructure – uses fingerprints to narrow search range, then performs full substructure search algorithm • Similarity – based on Tanimoto coefficient calculated between the fingerprints InChI=1/H2O/h1H2 0010110010 1010110111 Tanimoto(a,b) = c / (a+b-c) = 4 / (4+7-4) = 0.57 a 0010110010 b 1010110111 ChEBI – Chemical Entities of Biological Interest
Browse via Periodic Table Molecular entities / Elements ChEBI – Chemical Entities of Biological Interest
Navigate via links in ontology Click to follow links ChEBI – Chemical Entities of Biological Interest
Understanding the ChEBI ontology Block 3
Annotation of bioinformatics data • Essential for capturing understanding and knowledge associated with core data • Often captured in free text, which is easier to read and better for conveying understanding to a human audience, but… • Difficult for computers to parse • Quality varies from database to database • Terminology used varies from annotator to annotator • Towards annotation using standard vocabularies: ontologies within bioinformatics ChEBI – Chemical Entities of Biological Interest
The ChEBI ontology Organised into three sub-ontologies, namely • Molecular structure ontology • Subatomic particle ontology • Role ontology (R)-adrenaline ChEBI – Chemical Entities of Biological Interest
Molecular structure ontology ChEBI – Chemical Entities of Biological Interest
Role ontology ChEBI – Chemical Entities of Biological Interest
ChEBI ontology relationships • Generic ontology relationships • Chemistry-specific relationships ChEBI – Chemical Entities of Biological Interest
Viewing ChEBI ontology ChEBI – Chemical Entities of Biological Interest
Viewing ChEBI ontology [2] Tree view ChEBI – Chemical Entities of Biological Interest
Browsing ChEBI ontology (OLS) Browse the ontology Ontology Lookup Service (OLS): http://www.ebi.ac.uk/ontology-lookup/ ChEBI – Chemical Entities of Biological Interest
Ontology Lookup Service • Provides a centralised query interface for ontology and controlled vocabulary lookup • Can integrate any ontology available in OBO (Open Biomedical Ontologies) format • At last release, 58 ontologies integrated, including • GO • ChEBI • Molecular interaction (PSI MI) • Pathway ontology (PW) • Human disease (DOID) • and many more… • Provides a search and a browse facility, as well as displaying a graph of terms and relationships ChEBI – Chemical Entities of Biological Interest