1 / 38

A Fast Tour of Predictive Cheminformatics Curt Breneman Perspectives in Chemistry October 19, 2011

A Fast Tour of Predictive Cheminformatics Curt Breneman Perspectives in Chemistry October 19, 2011. The Problem Domain…. Data Management Core Curation and Standardization *** Data Collection Database Management Model Implementation. RECCR Cheminformatics Center. Descriptor Core

aziza
Download Presentation

A Fast Tour of Predictive Cheminformatics Curt Breneman Perspectives in Chemistry October 19, 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Fast Tour of Predictive Cheminformatics Curt Breneman Perspectives in Chemistry October 19, 2011

  2. The Problem Domain…

  3. Data Management Core Curation and Standardization *** Data Collection Database Management Model Implementation RECCR Cheminformatics Center Descriptor Core HTS Descriptors*** Structure-based design Ligand-based design Molecular Similarity Descriptor benchmarks Modeling Core Alternate Model Fusion*** Task-targeted modeling Multi-objective Learning Applicability Domains Model benchmarks Software Engineering and Dissemination Algorithm Implementation *** Computing Visualization Database Development User Interface Support and Documentation

  4. The Evolution of Informatics ~4000 BC Experiments, observations, records

  5. The Evolution of Informatics ~4000 BC Experiment ~1700 AD Theory, formalism, publication

  6. The Evolution of Informatics ~4000 BC Experiment ~1700 AD Theory ~1950+ Computation – first steps

  7. The Evolution of Informatics ~4000 BC Experiment ~1700 AD Theory ~1950+ Computation ~1970+ Simulation

  8. The Evolution of Informatics ~4000 BC Experiment ~1700 AD Theory ~1950+ Computation ~1970+ Simulation ~1990+ Cheminformatics&Data Mining

  9. The Data Mining Process WISDOM UNDERSTANDING KNOWLEDGE INFORMATION DATA

  10. Data Mining a Data-Rich Environment Experiment No Prior Hypothesis

  11. Intersection of Chemistry and Biology 6627 small molecules 151 diverse assays

  12. Mapping Chemistry to Biology Alignment-free Molecular Property Descriptors Multi-Latent Analysis Modeling Tools pH 4 Mol 1 Mol 2 Mol 3 Mol 4 Mol 5 Mol 6 Mol 7 Mol 8 Mol 9 Mol 10 Mol 11 Mol 12 Mol 13 Mol 14 pH 5 Mol 1 Mol 2 Mol 3 Mol 4 Mol 5 Mol 6 Mol 7 Mol 8 Mol 9 Mol 10 Mol 11 Mol 12 Mol 13 Mol 14 Mol 15 Mol 16 Mol 17 pH 6 Mol 1 Mol 2 Mol 3 Mol 4 Mol 5 Mol 6 Mol 7 Mol 8 Mol 9 Mol 10 Mol 11 Mol 12 Mol 13 Mol 14 Mol 15 Mol 16 Mol 17 pH 7 Mol 1 Mol 2 Mol 3 Mol 4 Mol 5 Mol 6 Mol 7 Mol 8 Mol 9 Mol 10 Mol 11 Mol 12 Mol 13 Mol 14 Mol 15 pH 8 Mol 1 Mol 2 Mol 3 Mol 4 Mol 5 Mol 6 Mol 7 Mol 8 Mol 9 Mol 10 Mol 11 Mol 12 Mol 13 Mol 14 Multi-Objective Learning Non-linear Model Building and Validation Methods

  13. Predictive Cheminformatics Workflow MOLECULAR STRUCTURE DATASET NECCR, PubChem MLSCN, MLI, PDB, corporate partners ACTIVITY MODEL MOLECULAR DESCRIPTORS Synthesis, Assay MOLECULAR ENVIRONMENT PREDICTED ACTIVITY

  14. Knowledge Discovery and Data Fusion Domain expert molecular understanding FUSED DATA Database #1 Database #n

  15. Data Fusion: A Combination of Strengths

  16. Structural Descriptors Physiochemical Descriptors Topological Descriptors Geometrical Descriptors + Activity Modeling Bioactivity = Molecular Structures Descriptors Model Activity

  17. What is “Molecular Structure”?

  18. Molecular Structures Model Activity Representing Molecular Structure AAACCTCATAGGAAGCATACCAGGAATTACATCA…

  19. Structural Descriptors Physiochemical Descriptors Topological Descriptors Geometrical Descriptors Constitutional Descriptors Electrostatic Descriptors Quantum-chemical Descriptors Thermodynamic Descriptors Descriptor Types Molecular Structures Descriptors Model Activity

  20. Descriptor Selection • What features of a molecule are related to my activity? • What descriptors can capture that information? Molecular Structures Descriptors Model Activity

  21. Surface Property Distribution Histograms Molecular surface property distributions can be represented as RECON/TAE histogram bin descriptors (RECON/TAE) Descriptors

  22. EP MLP PROLICSS Surface 6CPA histogram histogram PROLICSS: Protein-Ligand Complementary Surface Scoring

  23. PMF expansion-based hydration patterns • Developing an efficient alternative to full simulations by means of a potentials-of-mean-force expansion • employing a library of lower-order correlation functions derived from explicit simulations to predict the average equilibrium density and the orientation profile of water in the space surrounding biomolecules or ligands. Water density values in space surrounding an alpha-helix (left) and a protein X (right) predicted using the PMF expansion (cyan) and obtained from exact simulation (magenta)

  24. Predictive Cheminformatics Learning from the past to predict the future…

  25. Challenges in Predictive Modeling • “First there are the known knowns” • These are the things that we know we know • “Then there are the known unknowns” • These are the things that we now know we do not know • “Finally there are also the unknown unknowns” • These are the things that we do not yet know we do not know • “And each day brings us a few more unknown unknowns” • Donald Rumsfeld, 2003

  26. Prediction Pitfalls… • “Who wants to hear actors talk?” – H.M. Warner, 1927 • “Forget it – no civil war picture ever made a nickel” – MGM executive, in 1937, advising against production of “Gone with the Wind” • “I think there might be a market for maybe five computers” – Thomas Watson, IBM, 1943 • “Computers in the future may weigh no more than 1.5 tons” – Popular Mechanics, 1949 • “There is no reason anyone would want a computer in their home” – Ken Olsen, founder of Digital Equipment Corporation, 1977

  27. Machine Learning Methods and Statistical Modeling “If your experiment needs statistics, you ought to have done a better experiment” - Ernest Rutherford “But what if you haven’t done the experiment yet?” - Curt Breneman

  28. Model Building and Validation DATASET Training set Test set Y-scrambling model validation! Bootstrap sample k Predictive Model Training Validation Learning Model Tuning / Prediction Prediction

  29. Model Applicability Domains

  30. Predictive QSAR Workflow example ~ 760 kNN QSAR models 10 Best models Acceptance criteria 48 anticonvulsants * Mining DBs using Probes Ca. 255,000chemicals in DBs 50 consensus (common) hits Similarity Cutoff 4334 hits Predictions with 10 QSAR models using applicability domain 9 compounds selected based on synthetic considerations 22compounds submitted to chemists 7compounds active NIH testing *Shen, M., et al. J. Med Chem., 2002, 45, 2811-2823; Shen, M., et al 2004, 47, 2356-2364.

  31. RECCR Interactive Applications Data Preparation, Descriptor Generation and Modeling

  32. ROMS Predictions

  33. Software developed at RECCR Mfold (Mike Zuker) • RNA, DNA secondary struture prediction Analyze (Mark Embrechts) • Fast KPLS test set mode with low memory footprint RECON • Transferable Atom Equivalent descriptors RECON for MOE • Drop-in interactive for MOE 2007 PROTEIN RECON for protein characterization • Property moment descriptors COLIBRI (with Alex Tropsha) • Binding site/ligand scoring using Universal Descriptor Space DIXEL • DNA Characterization and bioinformatics PEST • Compatible with Gaussian or Jaguar Software employing TAE descriptors

  34. The RECCR Community http://reccr.chem.rpi.edu

  35. ACKNOWLEDGMENTS • Current and Former members of the DDASSL group • Breneman Research Group (RPI Chemistry) • N. Sukumar • M. Sundling • Min Li • Long Han • Jed Zaretski • Theresa Hepburn • Mike Krein • Steve Mulick • Shiina Akasaka • Hongmei Zhang • C. Whitehead (Pfizer Global Research) • L. Shen (BNPI) • L. Lockwood (Syracuse Research Corporation) • M. Song (Synta Pharmaceuticals) • D. Zhuang (Simulations Plus) • W. Katt (Yale University chemistry graduate program) • Q. Luo (J & J) • Embrechts Research Group (RPI DSES) • Tropsha Research Group (UNC Chapel Hill) • Bennett Research Group (RPI Mathematics) • Jinbo Bi • Collaborators: • Lawrence Research Group (NYS Wadsworth Labs) • Inna Vitol • Cramer Research Group (RPI Chemical Engineering) • Funding • NIH (GM047372-07) • NIH (1P20HG003899-01) • NSF (BES-0214183, BES-0079436, IIS-9979860) • GE Corporate R&D Center • Millennium Pharmaceuticals • Concurrent Pharmaceuticals • Pfizer Pharmaceuticals • ICAGEN Pharmaceuticals • Eastman Kodak Company • Chemical Computing Group (CCG)

More Related