220 likes | 354 Views
Increasing the Value of Crystallographic Databases. Derived knowledge bases Knowledge-based applications programs Data mining tools for protein-ligand complexes. Mogul. Knowledge base of molecular geometry information taken from CSD Bond length, valence angle and torsion angle distributions
E N D
Increasing the Value ofCrystallographic Databases • Derived knowledge bases • Knowledge-based applications programs • Data mining tools for protein-ligand complexes
Mogul • Knowledge base of molecular geometry information taken from CSD • Bond length, valence angle and torsion angle distributions • Aim: click on a molecular parameter of interest and get observed distribution with no intervening steps
Mogul - Search Setup User loads a molecule then specifies a bond length, bond angle or torsion angle, of interest
Mogul - Results Substructure
A D B C Mogul - Search Algorithm Substructures stored in a hierarchical tree: Properties of A-B & C-D bonds Properties of B,C Properties of atoms bound to B and C
Mogul - Getting More Hits Allow certain atoms to be more general Generification rules
Mogul - Generic Search Results Substructures sorted by 2D similarity with original query
IsoStar and SuperStar • IsoStar - knowledge base of information about intermolecular interactions • SuperStar - program for predicting binding points in an enzyme active site • SuperStar predictions based solely on IsoStar data
CSD vs. PDB scatterplots Similarity index distribution for 72 comparisons
Scaling of IsoStar Surfaces • Densities of grid point i are converted to propensities by: • Average density is the density of contacts expected by random chance:
SuperStar = + • Calculate binding positions for specific probe atoms in protein active sites • Identify functional groups in binding-site • Look up relevant IsoStar scatterplots and overlay on functional groups • Contour - combining by taking products
SuperStar Features • Cavity detection • Surface or pharmacophore point display • Metal coordination • Hyperlinking to IsoStar scatterplots • Choice of CSD- or PDB-based maps • Gaussian fits
265 PDB complexes Generate four maps (Me, C=O, NH, OH) See whether maps discriminate correctly, e.g. does Me have highest propensity where a ligand Me group is observed? Compute percentage success rate CSD 74% PDB 75% Gaussian CSD 70 - 74% PDB maps fuzzier, fewer probes possible Gaussian 4-5 times faster SuperStar Validation
Relibase+ • Protein-ligand database system • Based on original software developed by Manfred Hendlich and colleagues at Merck and Marburg University • Enables searching of PDB and of in-house proprietary databases
Some Relibase+ Options • Text searching • Sequence searching • 2D substructure and similarity searching • 3D substructure searching • Logical combination of hit lists • Searching for intermolecular interactions • Auto-superposition of similar binding sites • Scripting facility based on Python
Analysis of 3D Queries Benzamidine-Carboxylate Interactions Distance Distribution Torsion Distribution
Example Python Script # Find all benzamidines # and check contacts to ASP under 3Å relibase.load(’dbase1') ba = relibase.Hitlist({'smiles':'c1ccccc1C(=N)N'}) new = relibase.Hitlist() for ligand in ba: for chain in ligand.contacts(): for residue in chain.residues(): if residue.name() == 'ASP': ligatoms = ligand.atoms() resatoms = residue.atoms() d = mindist(ligatoms,resatoms) if d < 3.0: new.append(ligand) new.saveas(’contact')
Manfred Hendlich Gerhard Klebe Ingo Dramburg Andreas Bergner Ian Bruno Jason Cole Paul Edgington Magnus Kessler Jie Luo Clare Macrae Patrick McCabe Willem Nissink Jon Pearson Scott Rowland Barry Smith Marcel Verdonk Acknowledgements