250 likes | 410 Views
Databases and Resources on 3D Structures of Biological Macromolecules. Inter-University DEA/DES Bioinformatics 2000-2001 Shoshana J. Wodak, SCMBB-ULB. The different types of macromolecular structure databases. - Major public repositories for (primary) structural data.
E N D
Databases and Resources on 3D Structures of Biological Macromolecules Inter-University DEA/DES Bioinformatics 2000-2001 Shoshana J. Wodak, SCMBB-ULB
The different types of macromolecular structure databases - Major public repositories for (primary) structural data - Databases of derived information: -Classifications of protein domains & folds -etc….. -On-line servers for analysing protein structures -defining structural domains -assigning secondary structure -calculating surfaces and volumes -etc…. Overview of the different databases and underlying methods
Repositories for data on 3D structures of Biological Macromolecules PDB: Protein Data Bank : 3D structures of biological macromolecules [http://www.rcsb.org/pdb/] MMDB:Entrez (NCBI) structure database (no models) [http://www.ncbi.nlm.nih.gov:80/Structure/MMDB/mmdb.shtml] BioMagResBank: data on 3D structures determined by NMR [http://www.bmrb.wisc.edu/] CSD: Cambridge small molecule database [http://www.ccdc.cam.ac.uk/] http://www.expasy.ch/alinks.html#Proteins
Protein Data Bank (PDB) Total available structures Deposited structures for the year Dec. 5, 2000 13861 coordinate entries 4406 structures factor files 904 NMR restraints files New folds Old folds
Methods for determining 3D structures of biological macromolecules Methods yielding models at atomic resolution - X-ray diffraction - Neutron diffraction - High resolution Nuclear Magnetic Resonance RNM Methods yielding low resolution models, or info on structure - Low-angle x-ray diffraction - Electron microscopy; electron diffraction - CD and infra-red spectroscopy
The 3D structure of biological macromolecules by X-ray diffraction Diffraction pattern diffracted X-ray beams in discrete directions, l incident x-ray beam,l X-ray source crystal Crystal
From the diffraction pattern to the 3D atomic model Diffraction pattern Atomic model -derive phases -compute r(x) = FT (F(s)) -build and refine model
The 3D structure of biological macromolecules by NMR spectroscopy (NOE) distance constraints Chem. Shifts, Jcoupl 2D proton NMR spectrum Of b-hairpin Atomic model of b-sheet from NMR data
Insulin gene enhancer protein Mammalian (rat) Engrailed homoedomain Drosophila Melanogaster NMR structure, 50 conformations Crystal structure, Resolution: 2.1Å
From atomic model to physical properties and molecular function Ribbon drawing, Rasmol/MolMol/ Electrostatic potential displayed on molecular surface
Classifications of protein structures (domains) CATH: structural classification of proteins, [http://www.biochem.ucl.ac.uk/bsm/cath/] SCOP: Structural classification of proteins [http://scop.mrc-lmb.cam.ac.uk/scop/] FSSP:Fold classification based on structure alignments [http://www.sander.ebi.ac.uk/fssp/] HSSP: Homology derived secndary structure assignments [http://www.sander.ebi.ac.uk/hssp/] DALI:Classification of protein domains [http://www.ebi.ac.uk/dali/domain/] VAST: structural neighbours by direct 3D structure comparison [http://www.ncbi.nlm.nih.gov:80/Structure/VAST/vast.shtml] CE: Structure comparisons by Combinatorial Extension [http://cl.sdsc.edu/ce.html]
Some aspects of methodology - Secondary structure assignments - Structure comparisons; structure-structure alignments - defining structural domains from atomic coordinates - Calculation of the molecular and solvent accessible surface (will be dealt with subsequently ) - Homology modelling (will be dealt with subsequently)
Secondary structure assignments -By the crystallographer -visual inspection; modelling programs (‘O’) ) -By completely automatic procedures -DSSP (Kabsch & Sander, 1983) computes f,y angles, H-bonds solvent accessibilities
Structure comparisons and structure-structure alignments Structure B Structure A Q: Is structure A similar to structure B ? A: from structure alignments see accompanying transparencies
Defining Domains: What for? Identify regions of the polypeptide chain that fold independently; are stable on their own folding units; initiation sites for folding Identify gene fusion or gene insertion events from analysis of the 3D structure rrelate to evolutionary history Allow for meaningful structural classification of proteins rSCOP ; CATH classifications
Defining Domains: What for? Link domain structure to function Enzyme active sites are often at domain interfaces; domain movements play a functional role Different structural domains can be associated with different functions DNA Methyltransferase Cathepsin D
Methods for Identifying Domains • Underlying principles: • Interactions between residues within domains are more extensive than between domains D 1 D 2 Wetlaufer (1973) Richardson (1981) • Interactions are modelled by counting inter-atomic contacts or computing buried surface area
Methods for Identifying Domains Underlying principles: • Domain limits are defined by identifying groups of residues such that Nb of contacts between groups is minimized. N N C C 4-cuts 1-cut N C 2-cuts
Methods for Identifying Domains Visual inspection -Philips (1956) hen lysozyme -Porter (1959;1973) immunoglobulin light chains -Drenth et al. (1968) protease papain -Wetlaufer (1973) & Richardson (1981)several proteins Systematic surveys -Rossman & Liljas (1974) domains from distance maps -Crippen (1978) cluster segments/contact density -Rose (1979) iterative splitting of contiguous segments -Wodak & Janin (1981) & Rashin (1981) buried surface area & globularity index -Sander (1981)domain limits from Cacontacts
Lactate dehydrogenase Domains From Contact Map
Lactate dehydrogenase Hierarchic splitting of Rose (1979) Hierarchic assembly of segments Crippen (1978) Hierarchic splits based on SA scans Wodak&Janin(1981) Concanavlin A
Methods for Identifying Domains Systematic surveys (continued) -Kikuchi et al. (1988) domains from distance maps -Holm & Sander (1994)cluster residues based on contacts -Islam et al. (1994) cluster segments with minimum inter-domain contacts -Siddiqui & Barton (1995) successive splits of contiguous segments -Sowdhamini & Blundell (1995) cluster secondary structure elements -Swindell (1995a,b) search for hydrophobic cores -Wernisch et al. (1999)graph heuristic/Voronoi cells -Taylor (1999)heuristic based on Ising model - Jones et al. (1998)consensus of 3 methods + manual
STRUctural Domain Limits (STRUDL) Wernsich , Huntings & Wodak (1999) New generation of procedures • Uses interface area as contact measure • Is based on a graph heuristic • - partitions Into any arbitrary set of residues • with no reference to chain connectivity or • secondary structure • -approximates closely the exact solution (B&B) • Generated partitions are accepted or rejected on basis of set of optimised criteria
Domain assignments with minor differences 1gph1 1gph1 STRUDL CATH 1pgd 1pgd STRUDL CATH
Other databases or servers of derived structural data -HSSP: Homology-derived secondary structure of proteins db [http://www.sander.ebi.ac.uk/hssp/] -Mol_R_Us: generate images for structures in PDB [http://molbio.info.nih.gov/cgi-bin/pdb] -TOPS: Protein topology atlas [http://tops.ebi.ac.uk/tops/html/1tph1.html] -BMM & DEE: servers computing domain limits from coordinates [http://jura.ebi.ac.uk:8080/3Dee/help/help_intro.html] -ReLiBase: Receptor/ligand complexes db [http://relibase.ebi.ac.uk/reli-cgi/rll?/reli-cgi/general_layout.pl+home] -SWISS-MODEL: Automatically generated protein models db -ModBase: Db of comparative protein structure models (for links see Expasy server)