290 likes | 430 Views
Construction of a Virtual Library of Potential Endocrine Disruptors for in silico Target Fishing. Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept. University of Innsbruck, Austria. Overview. What are Endocrine Disruptors?
E N D
Construction of a Virtual Library of Potential Endocrine Disruptors for in silico Target Fishing Christian Laggner, PhD. Computer Aided Molecular Design Group Pharm. Chem. Dept. University of Innsbruck, Austria
Overview • What are Endocrine Disruptors? • Need for computational screening methods • Construction of the compound library • Applicability of publicly available compound collections – problems, needs
Endocrine Disruptors • Exogenous substances that interfere with the endocrine system of humans or animals • Mimick endogenous hormones • Block the effects of hormones • Change the levels of hormones: stimulate or inhibit production, transport, or degredation • Disturb regulation of development, growth, reproduction, and behavior • Some common targets: • nuclear hormone receptors (ER, AR, PR, AhR, PPAR, RXR, TR, …) • oxidoreductases (Aro, 11b-HSD, …)
Some Examples ED chemicals come from various sources: • Pesticides • Insecticides • Bactericides, fungicides • Additives in polymers • Drugs • Side-effect • Release in wildlife (wastewater) • Phytoestrogens • Produced from precursor substances • Incomplete combustion • Wastewater
ED Screening Programs • US: Endocrine Disruptor Screening Program (EDSP) http://www.epa.gov/scipoly/oscpendo/index.htm • EU: REACH program (Registration, Evaluation and Authorisation of Chemicals), Endocrine Disrupters Website http://ec.europa.eu/environment/endocrine/index_en.htm Tens to hundreds of thousands of compounds from various sources to be screened against multiple targets • prioritize small subset for initial screening
Virtual High-Throughput Screening • Collection of pharmacophore models for over 300 unique targets, also ED targets • Fast screening of x compounds against y targets -> activity profiles • Find new candidates • Find new targets More on pharmacophore-based parallel screening in Thierry‘s talk at 3:15 pm…
But What Shall We Screen? • Endocrine Disruption Priority Setting Database v.2 http://www.ergweb.com/endocrine/ • For selecting chemicals for Tier 1 Screening • Pesticides, commercial chemicals, cosmetic ingredients, food additives, nutritional supplements, mixtures, … • 142,975 entries • No structures, but compound names and CAS numbers • Merge with structures from a public substance library (PubChem)
The PubChem Project • Part of NIH's Molecular Libraries Roadmap Initiative • Collects structures and information about molecules from various databases • DB sources: substance vendors, biological properties, toxicology, metabolic pathways, … • links to original database • Mixed bag of goodies: differrent information for various molecules
The PubChem Project • Data organized into 3 sub-databases: • PCSubstance: More than 19 Mio. substance records (= original database entries) • PCCompound: More than 10 Mio. compound records (= unique structures) • PCBioAssay: almost 600 bioassays with data for selected compounds • Data publicly accessible via • web browser: http://pubchem.ncbi.nlm.nih.gov/ • ftp client: ftp://ftp.ncbi.nlm.nih.gov/pubchem/ • access via a programmatic XML interface (PUG) http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi
Pipeline Pilot • Graphically compose data processing networks (“protocols”) • Configurable components for each step
Library Generation Overview Name list CAS nr. list Search PC Substances Merge by same name / CAS nr. Merge by structure Unmerged hits Unmerged hits Merged hits Merged hits Unique structures Filter, 3D conversion 3D database
Initial Searches Name list: • Names exist for 65.1% of initial 143.0k list entries • Filtered: • No CAS number(„Roofing paper“, „Putrescent whole egg solids“, „Red pepper“, „Paint“, …) • Name contains „polymer“, „derivative“, or „analogue“ • Name shorter than 4 • characters String length distribution peaks:truncated names 62305 (43.6%) unique names remaining
Initial Searches • Search with name list in PubChem Substances, July 2007 (17.8 Mio. entries): • 85,000 hits • 46.6% of list entries found • Takes 11.5 h on a Pentium 4, 3.0 GHz CAS number list: • 97.0% had unique number • Search in PubChem Substances: • 179,000 hits • 83.5% of list entries found • Takes 46 min • Only 3060 entries found by name and not by CAS number
Merge Hits for Same Search Terms Have molecular structure, not isotope-labeled, no R-groups. Correct protonation states Merge name hits by name / CAS hits by CAS number How to check whether different structures describe the same molecule? • Stereochemistry not always fully described • Solution: remove stereochemistry and compare SMILES string • Different tautomers for the same compound give different SMILES strings • Solution (not for all cases): InChI
InChI • IUPAC International Chemical Identifier • Describes chemical structures in layers and sublayers: chemical formula, connectivity, charges, protonation states, stereochemistry, isotopes, tautomerism • Different layers allow to adjust the level of similarity/identity but • tautomerism detection does not include keto-enol and ring-chain tautomerism (sugars…) InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11) 12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1
Merge Hits for Same Search Terms Merge multiple structures per entry by InChI without stereo and tautomerism layers Prefer the structure with the highest amount of stereo information (longest SMILES string) 4.5% (by name) and 7.9% (by CAS number) had still different structures: errors, additional components Check whether we can find a preferred structure
Looking for the Preferred Structure „ferric ammonium citrate“ gives two results: WRONG 4 hits (all from one database), 7 carbon atoms 3 hits (from three differrent databases), 6 carbon atoms • Preferred structure among hits from one database • Preferred structure among all hits
Merge Salts and Mixtures Remove small counterions, mixture components, neutralise compounds: preferrence for one structurefor >80% of problematic hits Check for wrong valences Merge all compounds with same structure • prioritize names of unmodifiedcompounds • save names, CAS, … of the others Citric acid 1,2,3-Propanetricarboxylic acid, 2-hydroxy-, manganese salt Magnesium citrate Citric acid monohydrate 1,2,3-Propanetricarboxylic acid, 2-hydroxy-, lead(2+) salt (2:3) Ferrous citrate 1,2,3-Propanetricarboxylic acid, 2-hydroxy-, iron salt 1,2,3-Propanetricarboxylic acid, 2-hydroxy-, iron(3+) salt (1:1) Ferric citrate Ferric ammonium citrate Sodium citrate Sodium citrate dihydrate
Checking for Right Valences Pentavalent carbon atoms are not so rare as you might think…
Final Filtering Keep only compounds suitable for pharmacophore screening: • Only selected elements: H, C, N, O, S, P, F, Cl, Br, I, B, Al, Si, Ge, As, Se, Sn, Sb, Te, Pb • Must have at least one C atom • 70 ≤ MW ≤ 1000 76754 compounds, 63.9% of search list
Construction of the 3D Database • Prepare 3D start conformation: add H atoms, generate 3D coordinates, minimize • Generate 3D database with Catalyst catDB (FAST, MaxConfs = 255): 76677 successfuly converted (99.9%)
Analysis of the Database • Derwent WDI 2005 (67050 entries): filtered, desalted, merged in same way 57667 entries remaining • Overlap: 8513 entries (14.8% WDI, 9.0% EDPSD) • Oral bioavailability (Lipinski‘s Rule of 5): • WDI 64.0% • EDPSD 79.2% • Druglikeness (Ghose et al.1999): • WDI 39.7% • EDPSD 18.2%
Analysis of Results Red: WDI Blue: EDPSD
Harvesting Structures from Public DBs • Many common chemicals can be retrieved by comparing public compound lists • Searching via a registry number (CAS, SID, CID, EINECS/ELINCS, …) is much faster than via name • Names splitted between PCSubstances and PCCompounds • Often wrong CAS number given (salts, hydrates, mixtures, …) • PCS: PUBCHEM_EXT_DATASOURCE_REGID: 408148 • PUBCHEM_SUBSTANCE_SYNONYM: 1H-Benzimidazol-5-amine, 2- (4-aminophenyl)- 2-(4-Aminophenyl)-5-aminobenzimidazole • 7621-86-5 • NSC408148 • PCC: PUBCHEM_IUPAC_OPENEYE_NAME: 2-(4-aminophenyl)-3H-benzimidazol-5-amine • PUBCHEM_IUPAC_CAS_NAME: 2-(4-aminophenyl)-3H-benzimidazol-5-amine • PUBCHEM_IUPAC_NAME: 2-(4-aminophenyl)-3H-benzimidazol-5-amine • PUBCHEM_IUPAC_SYSTEMATIC_NAME: 2-(4-aminophenyl)-3H-benzimidazol-5-amine • PUBCHEM_IUPAC_TRADITIONAL_NAME: [2-(4-aminophenyl)-3H-benzimidazol-5-yl]amine
Harvesting Structures from Public DBs • Chirality information is often missing or unclearly defined • 2D structures: wedged bonds or pseudo-3D • 3D structures: atom stereo parity set ortake it from the 3D structure • Tautomerism: partially solved by InChI • No keto-enol tautomerism • No ring-chain tautomerism • Workaround: connectivity? (together with MW, MF)
Conclusions • Public databases and compound lists useful for in silico reprofiling of known compounds • Different sources - different level of information • Need standards for treating stereo information • Problem of tautomerism • There are always some errors… • Comparison of different data sources may help us find some of them • How can we give feedback about wrong structures and avoid further spreading of errors?
Acknowledgements • Simona Distinto • Johannes Kirchmair • Thierry Langer • Patrick Markt • Daniela Schuster • Gudrun Spitzer • Theodora Steindl • Lyubomir G. Nashev • Alex Odermatt • Fabian Bendix • Martin Biely • Alois Dornhofer • Robert Kosara • Judith Rollinger • Gerhard Wolber • Rémy D. Hoffmann • Nicolas Triballeau NIH / PubChem Project EPA / Endocrine Disruptor Screening Program
Finally… Thank you for your attention!