470 likes | 688 Views
Current trends & hot topics in Chemoinformatics. Traditional areas of application. Pharmaceutical & life science industry particularly in early stage drug design Databases of available chemicals Electronic publishing including searchable chemical structure information in journals, etc.
E N D
Traditional areas of application • Pharmaceutical & life science industry • particularly in early stage drug design • Databases of available chemicals • Electronic publishing • including searchable chemical structure information in journals, etc. • Government and patent databases
The theoryso far (1960’s to present) … • How do you represent 2D and 3D chemical structures? • Not just a pretty picture • How do you search databases of chemical structures? • Google doesn’t help (much, but it might do soon…) • How do you organize large amounts of chemical information? • How do you visualize chemical structures & proteins? • Can computers predict how chemicals are going to behave • … in the test tube? • … in the body?
Current trends & hot topics • The move of chemical informatics into the public domain (PubChem, MLI, eScience, open source) • Service-oriented architectures • Packaging & processing large volumes of complex information for human consumption • Integration with other –ics (bioinformatics, genomics, proteomics, systems biology)
What does it mean for the bench chemist? • An increasing number of web tools and databases available which can aid in compound acquisition, synthesis, and biological profiling • A trend towards more (and more effective) use of computers in the lab - not just for email • A need for most synthetic chemists (and all medicinal chemists) to be aware of computational techniques and how they can assist in the compound synthesis and drug discovery processes • An opportunity to combine an interest in chemistry with an interest in computers
Chemoinformatics software vendors • Accelrys-Large chemoinformatics company • ACD/Labs - analytical informatics & predictions • Digital Chemistry- 2D fingerprinting, clustering toolkits & software • Cambridgesoft - 2D drawing tools & E-notebooks • CAS- produce Scifinder Scholar searching software • ChemAxon - Java based toolkits and software • Daylight - 2D representation & searching software • Leadscope - 2D structure and property tools • Lion Bioscience - produce LeadNavigator • MDL- Large chemoinformatics company • Mesa Analytics and Computing - Educational & Statistical tools • Openeye- Fast 3D docking, structure generation, toolkits • Quantum Pharmaceuticals - prediction, docking, screening • Sage Informatics - ChemTK 2D analysis software • Tripos- Large chemoinformatics company
Main academic sites • “Pure” Chemoinformatics • University of Sheffield, UK (Willett / Gillet) • http://www.shef.ac.uk/uni/academic/I-M/is/research/cirg.html • Erlangen, Germany (Gasteiger) • http://www2.chemie.uni-erlangen.de/ • Cambridge Unilever Center • http://www-ucc.ch.cam.ac.uk/ • Indiana University School of Informatics • http://www.informatics.indiana.edu/
Main academic sites • Related (computational chemistry, etc.) • UCSF (Kuntz) • http://mdi.ucsf.edu/ • University of Texas (Pearlman) • http://www.utexas.edu/pharmacy/divisions/pharmaceutics/faculty/pearlman.html • Yale (Jorgensen) • http://zarbi.chem.yale.edu/ • University of Michigan (Crippen) • http://www.umich.edu/~pharmacy/MedChem/faculty/crippen/
“Traditional” Journals • Journal of Chemical Information & Modeling (formerly JCICS) • http://pubs.acs.org/journals/jcisd8/index.html • Journal of Computer-Aided Molecular Design • http://www.kluweronline.com/issn/0920-654X • Journal of Molecular Graphics and Modeling • http://www.elsevier.com/inca/publications/store/5/2/5/0/1/2/ • Journal of Computational Chemistry • http://www3.interscience.wiley.com/cgi-bin/jhome/33822 • Journal of Chemical Theory and Computation • http://pubs.acs.org/journals/jctcce/ • Journal of Medicinal Chemistry • http://pubs.acs.org/journals/jmcmar/
“Informal” publications • Network Science (online) • http://www.netsci.org/Science/index.html • Chemical & Engineering News • http://pubs.acs.org/cen/ • Drug Discovery Today • http://www.drugdiscoverytoday.com/ • Scientific Computing World • http://www.scientific-computing.com/ • Bio-IT World • http://www.bio-itworld.com/
Yahoo! Chemoinformatics Discussion List • For • Job postings • Ideas exchange • Questions • Industry – Student connections To join, go to http://groups.yahoo.com/group/chemoinf Or send an email to chemoinf-subscribe@yahoogroups.com
Example 1High-Throughput Screening Testing perhaps millions of compounds in a corporate collection to see if any show activity against a certain disease protein
High-Throughput Screening • Traditionally, small numbers of compounds were tested for a particular project or therapeutic area • About 10 years ago, technology developed that enabled large numbers of compounds to be assayed quickly • High-throughput screening can now test 100,000 compounds a day for activity against a protein target • Maybe tens of thousands of these compounds will show some activity for the protein • The chemist needs to intelligently select the 2 - 3 classes of compounds that show the most promise for being drugs to follow-up
Informatics Implications • Need to be able to store chemical structure and biological data for millions of data points • Computational representation of 2D structure • Need to be able to organize thousands of active compounds into meaningful groups • Group similar structures together and relate to activity • Need to learn as much information as possible(data mining) • Apply statistical methods to the structures and related information
Tools for mining the data Tripos Benchware HTS Dataminer (formerly SAR Navigator), www.tripos.com
Example 2: 3D Visualization & Docking • 3D Visualization of interactions between compounds and proteins • “Docking” compounds into proteins computationally
3D Visualization • X-ray crystallography and NMR Spectroscopy can reveal 3D structure of protein and bound compounds • Visualization of these “complexes” of proteins and potential drugs can help scientists understand the mechanism of action of the drug and to improve the design of a drug • Visualization uses computational “ball and stick” model of atoms and bonds, as well as surfaces • Stereoscopic visualization available
Docking algorithms • Require 3D atomic structure for protein, and 3D structure for compound (“ligand”) • May require initial rough positioning for the ligand • Will use an optimization method to try and find the best rotation and translation of the ligand in the protein, for optimal binding affinity
Genetic Algorithms • Create a “population” of possible solutions, encoded as “chromosomes” • Use “fitness function” to score solutions • Good solutions are combined together (“crossover”) and altered (“mutation”) to provide new solutions • The process repeats until the population “converges” on a solution
Sample GOLD output GMP into RNaseT1
Something fun… Screensaver that docks molecules while your computer is idle at http://www.grid.org/projects/cancer/
Historical ways of representing chemicals • Trivial name, e.g. Baking Soda, Aspirin, Citric Acid, etc. Identifies the compound, but gives no (or little) information about what it consists of • Chemical formula, e.g. C6H12O6. Specifies the type and quantity of the atoms in the compound, but not its structure (i.e. how the atoms are connected by bonds) • Systematic name, e.g. 1,2-dibromo-3-chloropropane. Identifies the atoms present and how they are connected by bonds.
Trivial and Systematic Names Trivial name: • tyrosine Systematic names: • -(p-hydroxyphenyl)alanine • -amino-p-hydroxyhydrocinnamic acid
Historical ways of representing chemicals 2D structure diagram shows atoms present and how they are connected by bonds • 3D structure diagram, shows how atoms are related to each other in 3D space. Can take a variety of forms. Accurate models only really possible since X-ray crystallography and computers… but ball and stick models have been around a long time! David Wild – Research Overview July 2006. Page 27
Early computer representations • How do we communicate structural information between humans and the computer? • Line notations, e.g. Wiswesser Line Notation (and later SMILES) • How do we represent the atoms and bonds in a molecule internally in a computer? • Atom lookup and connection tables
Linear notations • Represent the atoms, bonds and connectivity of a molecule in a linear text string • Consise representation • Originally designed for manual command line entry into text-only systems • Now an excellent format for file and database storage (e.g. can be held in a spreadsheet cell, on one line of a text file, or in an Oracle database text field)
Wiswesser Line Notation (obsolete) • WLN for this structure is QVYZ1R DQ • Uses text symbolic representation of function groups, e.g.: • Q = OH, V= -CO-, Z = -NH2, R = benzene • Other symbols represent branching, e.g. Y
SMILES Dave Weininger, Daylight www.daylight.com • (one possible) SMILES for this structure is OC(=O)C(N)CC1=CC=C(O)C=C1 • Can identify any chemical structure • There can be several ways of writing the same strucutre in SMILES (although a system of generating canonical SMILES) exists
SMILES – Atoms & Bonds • Atoms represented by their chemical symbol (C, N, S, O, Br, etc). Uppercase for aliphatic, lowercase for aromatic • Adjacent atoms implicitly single bonded, or = for double bond, or # for triple bond • Hydrogens usually implicit • Propane • CCC
SMILES – Atoms & Bonds • Atoms represented by their chemical symbol (C, N, S, O, Br, etc). Uppercase for aliphatic, lowercase for aromatic • Adjacent atoms implicitly single bonded, or = for double bond, or # for triple bond • Hydrogens usually implicit • 1-Propanol • CCCO • Or OCCC !
SMILES – Atoms & Bonds • Atoms represented by their chemical symbol (C, N, S, O, Br, etc). Uppercase for aliphatic, lowercase for aromatic • Adjacent atoms implicitly single bonded, or = for double bond, or # for triple bond • Hydrogens usually implicit • Propene • C=CC • Or CC=C !
SMILES – Branching & Rings • Parentheses represent branching • Ring enclosures represented by using numbers to signify attachment points • 2-Propanol • CC(O)C
SMILES – Branching & Rings • Parentheses represent branching • Ring enclosures represented by using numbers to signify attachment points • Cyclohexane • C1CCCCC1
SMILES – Branching & Rings • Parentheses represent branching • Ring enclosures represented by using numbers to signify attachment points • Benzene • c1ccccc1
SMILES – Branching & Rings • Parentheses represent branching • Ring enclosures represented by using numbers to signify attachment points • Bromobenzene • c1cc(Cl)ccc1
SMILES – Acetaminophen (Tylenol) • Acetaminophen • c1c(O)ccc(NC(=O)C)c1
SMILES – multiple ring structure • Indole • c1ccc2[nH]ccc2c1
Other SMILES notes • All Hydrogen atoms are implicit unless declared otherwise • Non-organic (i.e. not C,N,S,O,Cl,Br), Hydrogens and modified atoms neet to be placed in square brackets, e.g. [Pb], [Xe] • Charged species indicated by a + or – (and square brackets), e.g. [Na+], [N+], [O-], [Ca++] • Unknown atoms can be represented by a * (but watch out for confusion with SMARTS!) • Stereochemistry can be indicated using @@ • “Canonical SMILES” can be created
SMILES Homepage http://www.daylight.com/smiles/ Official Syntax Guide • Tutorial • Examples • Resources
Other Line Notations • ROSDAL - Beilstein Representation Of Structure Diagram Arranged Linearly 1O-2=3O,2-4-5N,4-6-7=-12-7,10-13O • Sybyl Line Notation (SLN) - Tripos OHC(=O)CH(NH2)CH2C[1]=CHCH=C(OH)CH=CH@1
Example free online web resources For more links, see http://www.chemoinf.com/
Pubchem http://pubchem.ncbi.nlm.nih.gov/
MolInspiration Property Calculations http://www.molinspiration.com/cgi-bin/properties