1 / 46

Current trends & hot topics in Chemoinformatics

Current trends & hot topics in Chemoinformatics. Traditional areas of application. Pharmaceutical & life science industry particularly in early stage drug design Databases of available chemicals Electronic publishing including searchable chemical structure information in journals, etc.

anisa
Download Presentation

Current trends & hot topics in Chemoinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Current trends & hot topicsin Chemoinformatics

  2. Traditional areas of application • Pharmaceutical & life science industry • particularly in early stage drug design • Databases of available chemicals • Electronic publishing • including searchable chemical structure information in journals, etc. • Government and patent databases

  3. The theoryso far (1960’s to present) … • How do you represent 2D and 3D chemical structures? • Not just a pretty picture • How do you search databases of chemical structures? • Google doesn’t help (much, but it might do soon…) • How do you organize large amounts of chemical information? • How do you visualize chemical structures & proteins? • Can computers predict how chemicals are going to behave • … in the test tube? • … in the body?

  4. Current trends & hot topics • The move of chemical informatics into the public domain (PubChem, MLI, eScience, open source) • Service-oriented architectures • Packaging & processing large volumes of complex information for human consumption • Integration with other –ics (bioinformatics, genomics, proteomics, systems biology)

  5. What does it mean for the bench chemist? • An increasing number of web tools and databases available which can aid in compound acquisition, synthesis, and biological profiling • A trend towards more (and more effective) use of computers in the lab - not just for email • A need for most synthetic chemists (and all medicinal chemists) to be aware of computational techniques and how they can assist in the compound synthesis and drug discovery processes • An opportunity to combine an interest in chemistry with an interest in computers

  6. Chemoinformatics software vendors • Accelrys-Large chemoinformatics company • ACD/Labs - analytical informatics & predictions • Digital Chemistry- 2D fingerprinting, clustering toolkits & software • Cambridgesoft - 2D drawing tools & E-notebooks • CAS- produce Scifinder Scholar searching software • ChemAxon - Java based toolkits and software • Daylight - 2D representation & searching software • Leadscope - 2D structure and property tools • Lion Bioscience - produce LeadNavigator • MDL- Large chemoinformatics company • Mesa Analytics and Computing - Educational & Statistical tools • Openeye- Fast 3D docking, structure generation, toolkits • Quantum Pharmaceuticals - prediction, docking, screening • Sage Informatics - ChemTK 2D analysis software • Tripos- Large chemoinformatics company

  7. Main academic sites • “Pure” Chemoinformatics • University of Sheffield, UK (Willett / Gillet) • http://www.shef.ac.uk/uni/academic/I-M/is/research/cirg.html • Erlangen, Germany (Gasteiger) • http://www2.chemie.uni-erlangen.de/ • Cambridge Unilever Center • http://www-ucc.ch.cam.ac.uk/ • Indiana University School of Informatics • http://www.informatics.indiana.edu/

  8. Main academic sites • Related (computational chemistry, etc.) • UCSF (Kuntz) • http://mdi.ucsf.edu/ • University of Texas (Pearlman) • http://www.utexas.edu/pharmacy/divisions/pharmaceutics/faculty/pearlman.html • Yale (Jorgensen) • http://zarbi.chem.yale.edu/ • University of Michigan (Crippen) • http://www.umich.edu/~pharmacy/MedChem/faculty/crippen/

  9. “Traditional” Journals • Journal of Chemical Information & Modeling (formerly JCICS) • http://pubs.acs.org/journals/jcisd8/index.html • Journal of Computer-Aided Molecular Design • http://www.kluweronline.com/issn/0920-654X • Journal of Molecular Graphics and Modeling • http://www.elsevier.com/inca/publications/store/5/2/5/0/1/2/ • Journal of Computational Chemistry • http://www3.interscience.wiley.com/cgi-bin/jhome/33822 • Journal of Chemical Theory and Computation • http://pubs.acs.org/journals/jctcce/ • Journal of Medicinal Chemistry • http://pubs.acs.org/journals/jmcmar/

  10. “Informal” publications • Network Science (online) • http://www.netsci.org/Science/index.html • Chemical & Engineering News • http://pubs.acs.org/cen/ • Drug Discovery Today • http://www.drugdiscoverytoday.com/ • Scientific Computing World • http://www.scientific-computing.com/ • Bio-IT World • http://www.bio-itworld.com/

  11. Yahoo! Chemoinformatics Discussion List • For • Job postings • Ideas exchange • Questions • Industry – Student connections To join, go to http://groups.yahoo.com/group/chemoinf Or send an email to chemoinf-subscribe@yahoogroups.com

  12. Impacting Industry

  13. Example 1High-Throughput Screening Testing perhaps millions of compounds in a corporate collection to see if any show activity against a certain disease protein

  14. High-Throughput Screening • Traditionally, small numbers of compounds were tested for a particular project or therapeutic area • About 10 years ago, technology developed that enabled large numbers of compounds to be assayed quickly • High-throughput screening can now test 100,000 compounds a day for activity against a protein target • Maybe tens of thousands of these compounds will show some activity for the protein • The chemist needs to intelligently select the 2 - 3 classes of compounds that show the most promise for being drugs to follow-up

  15. Informatics Implications • Need to be able to store chemical structure and biological data for millions of data points • Computational representation of 2D structure • Need to be able to organize thousands of active compounds into meaningful groups • Group similar structures together and relate to activity • Need to learn as much information as possible(data mining) • Apply statistical methods to the structures and related information

  16. Tools for mining the data Tripos Benchware HTS Dataminer (formerly SAR Navigator), www.tripos.com

  17. Example 2: 3D Visualization & Docking • 3D Visualization of interactions between compounds and proteins • “Docking” compounds into proteins computationally

  18. 3D Visualization • X-ray crystallography and NMR Spectroscopy can reveal 3D structure of protein and bound compounds • Visualization of these “complexes” of proteins and potential drugs can help scientists understand the mechanism of action of the drug and to improve the design of a drug • Visualization uses computational “ball and stick” model of atoms and bonds, as well as surfaces • Stereoscopic visualization available

  19. Accelrys Discovery Studio

  20. Docking algorithms • Require 3D atomic structure for protein, and 3D structure for compound (“ligand”) • May require initial rough positioning for the ligand • Will use an optimization method to try and find the best rotation and translation of the ligand in the protein, for optimal binding affinity

  21. Genetic Algorithms • Create a “population” of possible solutions, encoded as “chromosomes” • Use “fitness function” to score solutions • Good solutions are combined together (“crossover”) and altered (“mutation”) to provide new solutions • The process repeats until the population “converges” on a solution

  22. Sample GOLD output GMP into RNaseT1

  23. Something fun… Screensaver that docks molecules while your computer is idle at http://www.grid.org/projects/cancer/

  24. Representing 2D structures with SMILES

  25. Historical ways of representing chemicals • Trivial name, e.g. Baking Soda, Aspirin, Citric Acid, etc. Identifies the compound, but gives no (or little) information about what it consists of • Chemical formula, e.g. C6H12O6. Specifies the type and quantity of the atoms in the compound, but not its structure (i.e. how the atoms are connected by bonds) • Systematic name, e.g. 1,2-dibromo-3-chloropropane. Identifies the atoms present and how they are connected by bonds.

  26. Trivial and Systematic Names Trivial name: • tyrosine Systematic names: • -(p-hydroxyphenyl)alanine • -amino-p-hydroxyhydrocinnamic acid

  27. Historical ways of representing chemicals 2D structure diagram shows atoms present and how they are connected by bonds • 3D structure diagram, shows how atoms are related to each other in 3D space. Can take a variety of forms. Accurate models only really possible since X-ray crystallography and computers… but ball and stick models have been around a long time! David Wild – Research Overview July 2006. Page 27

  28. Early computer representations • How do we communicate structural information between humans and the computer? • Line notations, e.g. Wiswesser Line Notation (and later SMILES) • How do we represent the atoms and bonds in a molecule internally in a computer? • Atom lookup and connection tables

  29. Linear notations • Represent the atoms, bonds and connectivity of a molecule in a linear text string • Consise representation • Originally designed for manual command line entry into text-only systems • Now an excellent format for file and database storage (e.g. can be held in a spreadsheet cell, on one line of a text file, or in an Oracle database text field)

  30. Wiswesser Line Notation (obsolete) • WLN for this structure is QVYZ1R DQ • Uses text symbolic representation of function groups, e.g.: • Q = OH, V= -CO-, Z = -NH2, R = benzene • Other symbols represent branching, e.g. Y

  31. SMILES Dave Weininger, Daylight www.daylight.com • (one possible) SMILES for this structure is OC(=O)C(N)CC1=CC=C(O)C=C1 • Can identify any chemical structure • There can be several ways of writing the same strucutre in SMILES (although a system of generating canonical SMILES) exists

  32. SMILES – Atoms & Bonds • Atoms represented by their chemical symbol (C, N, S, O, Br, etc). Uppercase for aliphatic, lowercase for aromatic • Adjacent atoms implicitly single bonded, or = for double bond, or # for triple bond • Hydrogens usually implicit • Propane • CCC

  33. SMILES – Atoms & Bonds • Atoms represented by their chemical symbol (C, N, S, O, Br, etc). Uppercase for aliphatic, lowercase for aromatic • Adjacent atoms implicitly single bonded, or = for double bond, or # for triple bond • Hydrogens usually implicit • 1-Propanol • CCCO • Or OCCC !

  34. SMILES – Atoms & Bonds • Atoms represented by their chemical symbol (C, N, S, O, Br, etc). Uppercase for aliphatic, lowercase for aromatic • Adjacent atoms implicitly single bonded, or = for double bond, or # for triple bond • Hydrogens usually implicit • Propene • C=CC • Or CC=C !

  35. SMILES – Branching & Rings • Parentheses represent branching • Ring enclosures represented by using numbers to signify attachment points • 2-Propanol • CC(O)C

  36. SMILES – Branching & Rings • Parentheses represent branching • Ring enclosures represented by using numbers to signify attachment points • Cyclohexane • C1CCCCC1

  37. SMILES – Branching & Rings • Parentheses represent branching • Ring enclosures represented by using numbers to signify attachment points • Benzene • c1ccccc1

  38. SMILES – Branching & Rings • Parentheses represent branching • Ring enclosures represented by using numbers to signify attachment points • Bromobenzene • c1cc(Cl)ccc1

  39. SMILES – Acetaminophen (Tylenol) • Acetaminophen • c1c(O)ccc(NC(=O)C)c1

  40. SMILES – multiple ring structure • Indole • c1ccc2[nH]ccc2c1

  41. Other SMILES notes • All Hydrogen atoms are implicit unless declared otherwise • Non-organic (i.e. not C,N,S,O,Cl,Br), Hydrogens and modified atoms neet to be placed in square brackets, e.g. [Pb], [Xe] • Charged species indicated by a + or – (and square brackets), e.g. [Na+], [N+], [O-], [Ca++] • Unknown atoms can be represented by a * (but watch out for confusion with SMARTS!) • Stereochemistry can be indicated using @@ • “Canonical SMILES” can be created

  42. SMILES Homepage http://www.daylight.com/smiles/ Official Syntax Guide • Tutorial • Examples • Resources

  43. Other Line Notations • ROSDAL - Beilstein Representation Of Structure Diagram Arranged Linearly 1O-2=3O,2-4-5N,4-6-7=-12-7,10-13O • Sybyl Line Notation (SLN) - Tripos OHC(=O)CH(NH2)CH2C[1]=CHCH=C(OH)CH=CH@1

  44. Example free online web resources For more links, see http://www.chemoinf.com/

  45. Pubchem http://pubchem.ncbi.nlm.nih.gov/

  46. MolInspiration Property Calculations http://www.molinspiration.com/cgi-bin/properties

More Related