1 / 34

John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign . John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre 12 Union Rd., Cambridge, UK .

hedy
Download Presentation

John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre 12 Union Rd., Cambridge, UK

  2. Assessment and Comparison of Ligand – Protein Structural Models • For the Crystallographer • What is wrong with my model? • What interesting features or differences with related structures can I highlight in my publication? • For the Molecular Modeller • What is wrong with the Crystallographer’s model? • What interesting features or differences with related structures can I use to inform my structure-based drug design campaign ? • Are there non-homologous structures with similar features that I need to watch out for?

  3. Why can’t I take a structure from the PDB and just use it ? • Validation of ligand structures bound to proteins 15% of 100 recent PDB entries have ligand geometry that are almost certainly in significant error (in house analysis using Relibase+/Mogul) 2006 Pre 2000

  4. How much ligand strain is accomodated by the protein? • Accepted View –Many ligands adopt strained conformation when bound to proteins, some (60%) do not bind even in a local minimum conformation. (Perola & Charifson, J. Med. Chem. 2004, 47, 2499-2510) • Alternative view – Ligands usually (but not always) bind in a local minimum. Many ‘strained’ structures found in the PDB are imperfectly refined. (Open-Eye, B. Kelley and G. Warren, EuroCYP)

  5. CCDC Tools that can help you • Relibase/Relibase+ - Web-based database system for searching, retrieving and analysing 3D structures of protein-ligand complexes in the Brookhaven Protein Data Bank (PDB) • Relibase is freely available for academics • Relibase+ has extra features (some of these will be used in this workshop) • The Cambridge Structural Database System- Database of > 400,000 small molecule crystallographic structures, and associated query software • Mogul and IsoStar knowledge-bases of molecular geometry and inter-molecular interactions • Directly linked access from Relibase+

  6. The Workshop Part 1: Validation of models and structural analysis • Analysing a protein structure for errors and interesting features • Comparing a structure with structures related by homology or by functionality Part 2: Probing the Protein-Ligand Interface • Substructure searching in Relibase/Relibase+ • Comparing the interactions of different ligands with the same target • Validating an unusual interaction using substructure searching in Relibase+

  7. Relibase+ • Relibase+ • Web-based database system for searching, retrieving and analysing 3D structures of protein-ligand complexes in the Brookhaven Protein Data Bank (PDB) • Successor to ReLiBase (developed by Manfred Hendlich et al. (Merck, Marburg U.) M. Hendlich,Acta Cryst. D54,1178-1182, 1998 • Relibase: free on WWW for academics • http://relibase.ccdc.cam.ac.uk/ • http://relibase.rutgers.edu/

  8. Relibase+ Basic Functionality • Keyword searching • FASTA protein sequence searching • 2D substructure searching • 3D protein-ligand interaction searching • Protein-protein interaction searching • Similarity searching for ligands • SMILES substructure matching • Automatic superposition of related binding sites to compare ligand binding modes, water positions, etc. • 3D visualisation with AstexViewer and ReliView(Hermes)

  9. Relibase+ Advanced Functionality • Functionality for generation and search of proprietary databases of protein-ligand complexes alongside the PDB • Links to the Mogul and IsoStar modules of the CSDS for geometry validation • Additional modules: Crystal packing, WaterBase, CavBase • Detailed analysis of superimposed binding sites • Enhanced treatment of hitlists • Reliscript: Command-line access via a Python-based toolkit • Coming Soon: SecBase including Turn Classification

  10. CavBase CavBase • Detect unexpected similarities amongst protein cavities (e.g. active sites) that share little or no sequence homology. • Similarity judged by matching 3D property descriptors (pseudocentres) that encode the shape and chemical characteristics of each cavity • No sequence information used, can detect similar cavities even if they have no obvious secondary-structure relationship • Developed by S.Schmitt et al., J.Mol.Biol. (2002)

  11. Cambridge Structural Database • Repository for the world’s small organic and metal-organic crystal structures (up to 500 non-H atoms) • Experimentally determined 3D structures via X-ray, and neutron diffraction methods • 2007 release contains 423,798 entries • approximately 32,000 entries added per year • Derived from around 1200 published sources • official depository for >80 major journals • majority of data directly deposited electronically (CIF) • Increasing number of Private Communications

  12. How much Data is Available? CSD Growth 1970-2006 419,768 entries June 2007 Growth of the CSD Predicted Growth to 2010 >500,000 entries during 2009

  13. CSD Information content Crystal structure data Atomic coordinates, unit-cell, space-group symmetry (fully validated)

  14. Bibliographic and Chemical Information CSD Information content • Bibliographic and chemical text and properties (all searchable) 4-Oxonicotinamide-1- (1’-beta-D-2’,3’,5’-tri-O-acetyl-ribofuranoside) Source: Rothmannia longiflora Colour: pale yellow Habit: acicular Polymorph: Form IV C17 H20 N2 O9 G. Bringmann, M. Ochse, K. Wolf, J. Kraus, K. Peters, E-M. Peters, M. Herderich, L. Ake, F. Tayman Phytochemistry 51 (1999), p271 R-factor: .0506 • Chemical diagram and chemical connectivity to enable 2D and 3D searching for substructures, pharmacophores and intermolecular interactions • Cross-referencing between entries

  15. VISTA Statistical analysis Cambridge Structural Database PreQuest Database Production ConQuest Database Search Mercury Graphical display, packing analysis Knowledge Bases IsoStar Library of Intermolecular Interactions Mogul Library of Molecular Geometry Cambridge Structural Database System

  16. A Knowledge Base of Molecular Geometries Mogul Bruno et al., J. Chem. Inf. Comput. Sci., 44, 2133-2144, 2004

  17. Mogul Rapid access to CSD information • Incorporates pre-computed libraries of bond lengths, valence angles and torsion angles, derived entirely from the CSD • Sketch or import molecule, then click on feature of interest to view distribution, mean values and statistics • Very fast search speeds, with hyperlinks to the CSD to view specific structures • Complete geometry: retrieve distributions for all bonds, angles and torsions in the molecule

  18. A Knowledge Base of Intermolecular Interactions IsoStar • Experimental data from: • Cambridge Structural Database • Protein Data Bank (protein-ligand complexes only) • Theoretical potential energy minima (DMA, IMPT) • Interaction distributions displayed immediately as scatterplots or contour surfaces • >20,000 CSD scatterplots, >5,500 PDB, 1,500 Eminima

  19. IsoStar Methodology central group:-CONH2 contact group:NH Search CSD or PDB for structures containing desired contact Superimpose hits and display as scatterplots

  20. Density Maps Can also represent distribution as density maps

  21. The Workshop Part 1: Validation of models and structural analysis • Analysing a protein structure for errors and interesting features • Comparing a structure with structures related by homology or by functionality Part 2: Probing the Protein-Ligand Interface • Substructure searching in Relibase/Relibase+ • Comparing the interactions of different ligands with the same target • Validating an unusual interaction using substructure searching in Relibase+

  22. How to access the workshop Webpage • http://relibase.ccdc.cam.ac.uk/ Email address • demo@ccdc.cam.ac.uk Password • s1mple

  23. Cavity Detection PROTEIN Based on the LIGSITE Program M.Hendlich et al., J.Mol.Graph. (1997).

  24. The pseudo-centre concept Coding Molecular Recognition into Simple Descriptors acceptor donor aliphatic pi/aromatic

  25. 3D Property Description Cavity Protein

  26. Similarity Search

  27. Similarity Search Clique detection Bron-Kerbosch

  28. Similarity Search Clique detection Bron-Kerbosch

  29. Similarity Analysis Scoring based on matching pseudo-centres, and the associated surface patches

  30. An Example 1OXO/1F2D • Overlay of PLP ligands • Matching pseudo-centres and surface patches shown

  31. Crystal Packing Important e.g. when docking ligands Binding site in Relibase+ Concanavalin A (1cjp)

  32. 1mtw reference ligand, no packing reference in green, first-rank solution atom-coloured

  33. 1mtw, Packing Included reference ligand, no packing GOLD’s first-rank solution including neighbouring chains

More Related