530 likes | 678 Views
UKY Seminar Weifan Zheng, Ph.D. Cheminformatics in Drug Discovery and Chemical Genomics Research. Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute, NC Central University Adjunct Associate Professor Department of Medicinal Chemistry
E N D
UKY Seminar Weifan Zheng, Ph.D. Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute, NC Central University Adjunct Associate Professor Department of Medicinal Chemistry University of North Carolina at Chapel Hill
UKY Seminar Weifan Zheng, Ph.D. Topics to Be Covered Biotech/Pharma Orphan Disease Chemical Genomics Computational Needs Compound Collection Docking Scoring Data Analytics CECCR Cheminformatics Center
UKY Seminar Weifan Zheng, Ph.D. Drug Discovery & Development Pipeline
UKY Seminar Weifan Zheng, Ph.D. Phases and Costs of Drug Discovery
UKY Seminar Weifan Zheng, Ph.D. Drug Discovery Process and the Roles of CADD • GR: Genetic Research; DR: Discovery Research; DD: Drug Discovery • CADD: computer-assisted drug discovery • ADMET: Absorption, distribution, metabolism, elimination, toxicity GR DR DD Preclin II III I IND T2H H2L LO T H L C Clinical trials CADD
UKY Seminar Weifan Zheng, Ph.D. Human Genome Project Success “Genome announcement 'technological triumph' Milestone in genetics ushers in new era of discovery, responsibility” CNN, June 26, 2000
UKY Seminar Weifan Zheng, Ph.D. Chemogenomics/Chemical Genomics F. Collins Chris Austin
UKY Seminar Weifan Zheng, Ph.D. Chemical Genomics • Chemogenomics • 69,000 in google (Oct.16, 2006) • Chemical genomics • 113,000 in google (Oct.16, 2006) • Chemical biology • 4,210,000 (Oct.16, 2006) • Chemical genetics • 104,000 (Oct.16, 2006)
Chemical genetics is a research method that uses small molecules to change the way proteins work—directly in real time rather than indirectly by manipulating their genes. It is used to identify which proteins regulate different biological processes, to understand in molecular detail how proteins perform their biological functions, and to identify small molecules that may be of medical value.
to create a national resource in chemical probe development. The center uses the latest industrial-scale technologies to collect data that is useful for defining the cross-section between chemical space and biological activity (and do so on genomic scale).
UKY Seminar Weifan Zheng, Ph.D. NIH Molecular Library Initiative MLI Chemical Synthesis Centers MLSCN (9+1) 9 centers 1 NIH intramural 20 x 10 = 200 assays ECCR (6) Exploratory Centers PubChem (NLM) CombiChem Parallel synthesis DOS 4 centers + DPI 100K – 1M compounds SAR matrix compounds 200 assays
UKY Seminar Weifan Zheng, Ph.D. Biological Assay Data • Biochemical assays • Cell-based functional assays • Phenotypic assays • Databases • PubChem (http://pubchem.ncbi.nlm.nih.gov/) • ChemBank (http://chembank.broad.harvard.edu/) • WOMBAT (http://sunsetmolecular.com/index.php) • Jubilant (http://www.jubilantbiosys.com/) • Gvk/Bio (http://www.gvkbio.com/)
Rules Virtual Libraries Diverse Lib Design Targeted Lib Design Drug Discovery Chemical Genomics KDD (QSAR, P.R.) Combinatorial Synthesis Scientific Logistics SAR Data Real Libraries HTS UKY Seminar Weifan Zheng, Ph.D. High Throughput Chemistry and Screening: Informatics
UKY Seminar Weifan Zheng, Ph.D. Topics to Be Covered Biotech/Pharma Orphan Disease Chemical Genomics Computational Needs Compound Collection Docking Scoring Data Analytics CECCR Cheminformatics Center
R2 (3000) (3000) R1 R3 (3000) UKY Seminar Weifan Zheng, Ph.D. Challenges in Combinatorial Chemistry 3,0003 / 1,000 per week = ~0.5 million years!!! • Library Design: rational selection of a subset of building blocks to obtain a maximum amount of information
UKY Seminar Weifan Zheng, Ph.D. Design for Activity: Similarity • If we know a compound is active, and we want to design a set of compounds that may be active against the same target, we may select • A set of compounds that are similar to the active compound • The similarity principle: similar compounds should have similar biological activity
X X X X • • • 1 2 3 20 Str. 1 2 5 1 4 • • • Str. 2 4 7 9 7 • • • Str. 3 1 6 8 6 • • • • • • • • • • • 2 • • • • • • • • 1 • • • • • • • • 3 X2 Str.100 0 3 5 • • • 1 UKY Seminar Weifan Zheng, Ph.D. X1 Molecular Identity and Molecular Similarity
UKY Seminar Weifan Zheng, Ph.D. Design for General Application: Diversity
UKY Seminar Weifan Zheng, Ph.D. Similarity and Diversity - Maxi Min - Minimize (Sum 1/Dij*Dij)
UKY Seminar Weifan Zheng, Ph.D. Cluster Hits Obtained by SAGE and Random Sampling
UKY Seminar Weifan Zheng, Ph.D. Drug Discovery & Development Failures 6% 21% 39% 29% Venkatesh & Lipper, J. Pharm. Sci. 89, 145-154 (2000)
UKY Seminar Weifan Zheng, Ph.D. Multi-Factorial Design
UKY Seminar Weifan Zheng, Ph.D. Total Score is the Weighted Sum of Individual Terms
R1 R2 R1 R2 Better Library Initial Library R1 R2 Optimal Library R1 Penalty Scores R2 P450 Activity Lipinski Properties Diversity Iteration
Designed Library Has a Better MW-clogP Distribution clogP Initial Ten solutions (undesigned) The final ten solutions (well designed)
X X X X • • • 1 2 3 20 Str. 1 2 5 1 4 • • • Str. 2 4 7 9 7 • • • Str. 3 1 6 8 6 • • • • • • • • • • • 2 • • • • • • • • 1 • • • • • • • • 3 X2 Str.100 0 3 5 • • • 1 UKY Seminar Weifan Zheng, Ph.D. X1 Molecular Identity and Molecular Similarity
UKY Seminar Weifan Zheng, Ph.D. SPE Algorithm (Agrafiotis) • Iterative Random Sampling D(a,b) D’(a,b) b a Embedding Space (2D) Original Space If D’ > D, move a, b closer If D’ < D, move a, b apart
UKY Seminar Weifan Zheng, Ph.D. Chemical Space - Compound Collection Comparison
UKY Seminar Weifan Zheng, Ph.D. Chemical Space - Compound Collection Comparison
UKY Seminar Weifan Zheng, Ph.D. Chemical Space - Compound Collection Comparison
UKY Seminar Weifan Zheng, Ph.D. SPE Embedding of ChemSpace
UKY Seminar Weifan Zheng, Ph.D. Topics to Be Covered Biotech/Pharma Orphan Disease Chemical Genomics Computational Needs Compound Collection Docking Scoring Data Analytics CECCR Cheminformatics Center
. . . . . . . . . . actual actual . . . . . . . UKY Seminar Weifan Zheng, Ph.D. predict predict Quantitative Structure-Activity Relationship (QSAR) q2=0.8 R2=0.75 Multiple Linear regression (MLR); partial least square (PLS); Artificial neural nets; k-nearest neighbor (kNN)
UKY Seminar Weifan Zheng, Ph.D. Basic Assumptions of KNN-QSAR Method • Structurally similar compounds should have similar biological activities • Biological similarities are often due to similarities of substructures (pharmacophore) • Biological activities can be estimated from molecular similarities, which are calculated with pharmacophore-specific descriptors
UKY Seminar Weifan Zheng, Ph.D. Comparison of CoMFA, GA-PLS, and KNN-QSAR
UKY Seminar Weifan Zheng, Ph.D. QSAR Based Virtual Screening for GPCR Ligand Design
UKY Seminar Weifan Zheng, Ph.D. Topics to Be Covered Biotech/Pharma Orphan Disease Chemical Genomics Computational Needs Compound Collection Docking Scoring Data Analytics CECCR Cheminformatics Center
Docking and Scoring • Early 1980’s, Kuntz, I.D. developed the first computerized molecular docking program: DOCK • GOLD, FRED, GLIDE, FLEXX, AutoDock, ICM X-ray structure
UKY Seminar Weifan Zheng, Ph.D. Our Approach to Derive DT-SCORE 1. Use Delaunay tessellation to derive geometrical chemical descriptors of protein ligand interface 2. Establish correlation between the geometrical chemical descriptors and protein-ligand binding affinity using Perceptron Learning algorithm
Receptor-ligand Complexes Tessellation of receptor -ligand interface Descriptor Generation Perceptron Learning algorithm Model Generation & Prediction DT-SCORE UKY Seminar Weifan Zheng, Ph.D. Flowchart to Derive DT-SCORE Binding constant
UKY Seminar Weifan Zheng, Ph.D. Delaunay Tessellation in 2D • Rigorous definition of nearest neighbors in 2D & 3D space - Delaunay tessellation Nearest neighbors are unambiguously defined in sets of three (in 2D) and in sets of four (in 3D)
UKY Seminar Weifan Zheng, Ph.D. Delaunay Tessellation of the Receptor-Ligand Interface
R R R R R A Detailed View of Active Site Tessellation L An atom is shared by several tetrahedra
UKY Seminar Weifan Zheng, Ph.D. 3 Types of Tetrahedra at the Receptor-Ligand Interface RLLL RRLL RRRL RLLL: Formed by 1 receptor atom and 3 ligand atoms RRLL: Formed by 2 receptor atoms and 2 ligand atoms RRRL: Formed by 3 receptor atoms and 1 ligand atom Each of the above tetrahedron types is further discriminated by atom types on the vertices
RRLL RRRL RLLL UKY Seminar Weifan Zheng, Ph.D. Geometrical Descriptors According to Tetrahedron Types …… …… …… NOCS COSC CNOO NCNO OSXN ONOS …… …… 4 0 …… 2 8 5 3
UKY Seminar Weifan Zheng, Ph.D. ( R·L Interaction Pattern – Binding Affinity Relationship Table) “QSAR” Input Table
x1 1 w1 x2 2 w2 y w3 x3 3 wN xN N Single-Layer Perceptron Network Input Layer Output Layer xi = input of neuron wi= weight associated with the input xi fn(.) = Activation function of output neuron.
UKY Seminar Weifan Zheng, Ph.D. Training Vs. Test Set Selection and Validation Entire dataset (264 complexes) 80% (214 complexes) 20% (50 complexes) Test set Training set Prediction of the test set (R2) Model development (q2)
UKY Seminar Weifan Zheng, Ph.D. Model Stability • Average value from multiple (ca. 80) models