250 likes | 705 Views
3D Molecular Structures. C371 Fall 2004. Morgan Algorithm (Leach & Gillet, p. 8). Bioisosteres (Leach & Gillet, p. 31). Milestones In Chemical Information: IV (PW).
E N D
3D Molecular Structures C371 Fall 2004
Milestones In Chemical Information: IV (PW) • Structure diagrams are planar but molecules are not, so need to extend existing 2D screening and graph-search methods to allow 3D substructure searching (Pfizer and Lederle, 1986-87) • Sources of 3D structural data • Experimental data (Cambridge Structure Database) • Computational chemistry (quantum mechanics, molecular mechanics, molecular dynamics) • Structure-generation methods for databases of molecules • CONCORD (Texas, 1987) • CORINA (Munich/Erlangen, 1990) • Further extensions to allow flexible searching (ICI, MDL and Tripos, 1991-94)
Use of 3D information in QSAR to facilitate structure-based approaches to drug discovery COmparative Molecular Field Analysis (Tripos 1988), and related approaches Calculate energies at points on a 3D grid surrounding a molecule Statistical correlation with activity to identify important positions in space Need for alignment Milestones In Molecular Modelling: IV (PW)
Current Activities: Virtual Screening (PW) • Need to prioritise the many molecules that could be tested • Increasingly sophisticated level of filtering to maximise the numbers of potential leads • “Drugability” considerations • Similarity searching (both 2D and 3D) using initial weak leads • 3D substructure searching once possible pharmacophoric patterns have been identified • Docking once the 3D structure of the biological target is available
Cambridge Structural Database • X-ray crystal structures of more than 250,000 compounds (organic and organometallic) • Established in 1965 • Textual queries • Structural queries • Specific 3D constraints (conformation or distance variables)
Protein Data Bank • More than 25,000 X-ray and NMR structures of protein and protein-ligand complexes • Some nucleic acid and carbohydrate structures • Founded in 1971 at Brookhaven National Laboratory; now run by a consortium • Retrieval by textual queries or in some interfaces by amino acid sequences
Uses of the CSD and PDB • Data mining for conformational properties and intermolecular interactions (CSD & PDB) • Data mining for information about intermolecular interactions (CSD & PDB) • Further understanding of the nature of protein structure and its relationship to amino acid sequence (PDB) • Homology modeling (comparative modeling) (PDB)
3D Pharmacophores • Definition: a set of features together with their relative spatial orientation that are thought to be capable of interaction with a particular biological target • Hydrogen bond donors and acceptors • Positively and negatively charged groups • Hydrophobic regions and aromatic rings • Depends on atomic properties rather than element types • Does not depend on specific chemical connectivity
Lipinski Rule of Five • Poor absorption or permeation are more likely when a molecule has: • More than five hydrogen bond donors • More than ten hydrogen bond acceptors • LogP greater than five • Molecular weight greater than 500
3D Database Searching • As with 2D searching, usually involves a 2-stage process • Rapid screen to eliminate molecules that cannot match the query • Graph matching to identify matches • Interatomic distances between pairs of atoms are important
Structure Generation Programs • CONCORD (Coordinates found in the CAS Registry File) • CORINA (COoRdINAtes) • About CORINA • Generating 3D structures with CORINA
Conformational Search and Analysis; Systematic Conformational Search • Goal of Conformational Analysis: identify all accessible minimum-energy structures of a molecule • Global minimum-energy conformation: the minimum with the lowest energy • Systematic searches assign values to the torsion angles of the rotatable bonds in the molecule
Random Conformational Search • Simulated annealing: temperature is gradually reduced from a high value to a low temperature
Other Conformational Searches • Distance geometry • Molecular dynamics
Deriving 3D Pharmacophores • Pharmacophore mapping: the process of deriving a 3D pharmacophore • Conformational flexibility • Different combinations of pharmacophoric groups in the molecule • Genetic algorithms: a class of optimization method based on computational models of Darwinian evolution
Applications: Structural Genomics • Definitions (Goals) • Characterization of all protein structures in a given genome • Provide sufficient coverage fold space to facilitate accurate homology modeling of the majority of proteins of biological interest • PDB Target Database (http://targetdb.rcsb.org/)
Searching protein sequences is well established: how to search the 3D structures in the Protein Data Bank (PDB)? Extensive collaboration between Information Studies and Molecular Biology and Biotechnology to develop graph representations of proteins that can be searched with isomorphism algorithms analogous to those used for chemical structures Focus here on folding motifs (secondary structure elements) in proteins but others Protein amino acid sidechains Carbohydrates Nucleic acids Searching 3D Protein Structures (PW)
Representation Of ProteinFolding Motifs: I (PW) • The helix and strand secondary structure elements (SSE) are both approximately linear, repeating structures, which can hence be represented by vectors drawn along their major axes • The nodes of the graph are these vectors and the edges comprise: • The angle between a pair of vectors • The distance of closest approach of the two vectors • The distance between the vectors’ mid-points • PROTEP compares such representation using a maximal common subgraph isomorphism algorithm to identify common folds
Structural Relationship Between Leucine Aminopeptidase And Carboxypeptidase A (PW) • Use of 1LAP as the target for a PROTEP search requiring structures with at least 7 SSEs in common with the target • The four carboxypeptidase structures in the PDB at that time have a fold containing five helices and eight strands in a sheet in common with 1LAP • The matched SSEs (in 5CPA) contain 86 residues with alpha-carbon RMSD of 1.77 Angstroms, but only 7% sequence homology for the equivalenced residues