470 likes | 547 Views
Secondary Structure & Solvent accessible surface Calculation. Lecture 6 Structural Bioinformatics Dr. Avraham Samson 81-871. DSSP. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features Wolfgang Kabsch, Christian Sander
E N D
Secondary Structure & Solvent accessible surface Calculation Lecture 6 Structural Bioinformatics Dr. Avraham Samson 81-871
DSSP Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features Wolfgang Kabsch, Christian Sander Biopolymers, Volume 22, Issue 12, pages 2577–2637, December 1983 Avraham Samson - Faculty of Medicine - Bar Ilan University
Solvent Accessibility Secondary Structure Amino Acids
Hydrogen bond donors and acceptors there are also side-chain acceptors and donors the carbonyl oxygen: main-chain hydrogen bond acceptor the amide nitrogen: main-chain hydrogen bond donor
Hydrogen bonded turns Avraham Samson - Faculty of Medicine - Bar Ilan University
Hydrogen bonded bridges Avraham Samson - Faculty of Medicine - Bar Ilan University
Bend Avraham Samson - Faculty of Medicine - Bar Ilan University
Chirality Avraham Samson - Faculty of Medicine - Bar Ilan University
Dihedral angle calculation The book "Crystal Structure Analysis for Chemists and Biologists" by Jenny P. Glusker gives four different ways of calculating the dihedral angle, p 465-469. Probably the most direct is: Consider the four atom chain 1 - 2 - 3 - 4 The distances between any two atoms is denoted d(ij). For example d13 is the distance between atoms 1 and 3. Since you already have cartesian coordinates, this is easily calculated as SQRT( SQ(x3-x1) + SQ(y3-y1) + SQ(z3-z1) ) The dihedral angle is defined as follows: cos(angle) = P/SQRT(Q) where P = SQ(d12) * ( SQ(d23)+SQ(d34)-SQ(d24)) + SQ(d23) * (-SQ(d23)+SQ(d34)+SQ(d24)) + SQ(d13) * ( SQ(d23)-SQ(d34)+SQ(d24)) - 2 * SQ(d23) * SQ(d14) and Q = (d12 + d23 + d13) * ( d12 + d23 - d13) * (d12 - d23 + d13) * (-d12 + d23 + d13 ) * (d23 + d34 + d24) * ( d23 + d34 - d24 ) * (d23 - d34 + d24) * (-d23 + d34 + d24 ) A test case, d12 = 2.38, d23 = 1.48, d34 = 1.48, d13 = 3.56, d14 = 3.61, d24 = 2.40 P = 20.83, SQRT(Q) = 21.40, angle = 13.3 degrees Avraham Samson - Faculty of Medicine - Bar Ilan University
Helices Avraham Samson - Faculty of Medicine - Bar Ilan University
Ladders and sheets Avraham Samson - Faculty of Medicine - Bar Ilan University
More details • SS-bonds • Chain breaks • Handedness (chirality) • Pymol and molmol use DSSP to assign secondary structure Avraham Samson - Faculty of Medicine - Bar Ilan University
amyloid-like fibril(left) of peptide GNNQNNY from the yeast prion protein Sup35, and itsatomic structure (right) Because of the repetitive nature of secondary structures, and particularly beta-sheets, proteins can form fibrillar structures and aggregates fibril axis in the case of this fibril the side chains also hydrogen bond to each other amide stacks Nelson et al (Eisenberg lab), Nature 435:773 (2005). for background on “polar zippers”: Perutz et al. PNAS 91:5355 (1991) These types of fibrils important in Huntington’s disease etc
Fibrillar helical structures: the leucine zipper Leu Leu The GCN4 dimer is formed through hydrophobic interactions between leucines (red) in the two polypeptide chains GCN4 “leucine zipper” (green) bound as a dimer (two copies of the polypeptide) to target DNA
DSSP Code: H = alpha helix G = 3-helix (3/10 helix) I = 5 helix (pi helix) B = residue in isolated beta-bridge E = extended strand, participates in beta ladder T = hydrogen bonded turn S = bend Blank = loop
Question: How would you assign structural neighbors (<5 A) from a PDB file? • Answer: Parse PDB file for atoms with distance less than 5 Angstroms!
Contact maps of protein structures -both axes are the sequence of the protein map of Ca-Ca distances < 6 Å near diagonal: local contacts in the sequence off-diagonal: long-range (nonlocal) contacts rainbow ribbon diagram blue to red: N to C 1avg--structure of triabin
Contact maps of protein structures -both axes are the sequence of the protein map of Ca-Ca distances < 6 Å rainbow ribbon diagram blue to red: N to C Structure of n15 Cro
Contact maps of protein structures -both axes are the sequence of the protein map of all heavy atom distances < 6 Å (includes side chains) rainbow ribbon diagram blue to red: N to C Structure of n15 Cro
Surface and interior of globular proteins solvent accessible surface molecular surface residue fractional accessibility pockets and cavities “hydrophobic core” ordered waters in protein structures
“Accessible Surface” mathematically roll a sphere all around that surface... represent atoms as spheres w/appropriate radii and eliminate overlapping parts... the sphere’s center traces out a surface as it rolls... Lee & Richards, 1971 Shrake & Rupley, 1973
Now look at a cross-section (slice) of a protein structure: Inner surfaces here are van der Waals. Outer surface is that traced out by the center of the sphere as it rolls around the van der Waals’ surface. If any part of the arc around a given atom is traced out, that atom is accessible to solvent. The solvent accessible surface of the atom is defined as the sum the arcs traced around an atom. there’s not much solvent accessible surface in the middle van der Waals surface solvent accessible surface from Lee & Richards, 1971 arc traced around atom
“Accessible surface”/“Molecular surface” note: these are alternative ways of representing the same reality: the surface which is essentially in contact with solvent
molecular and accessiblesurfaces are both useful representations, but molecular surface is more closely related to the actual atomic surfaces. This makes it somewhat better for visualizing the texture of the outer surface, as well as for assessing the shape and volume of any internal cavities. • you will hear the term Connolly surface used often, after Michael Connolly. A Connolly surface is a particular way of calculating the molecular surface.The accessible surface is also occasionally called the Richards surface, after Fred Richards.
Molecular surface of proteins depiction of the corresponding “molecular surface”--volume contained by this surface is vdW volume plus “interstitial volume”--spaces in between depiction of heavy atoms (O, N,C, S) in a protein as van der Waals spheres
The irregular surface of proteins: pockets and cavities • a pocket is an empty concavity on a protein surface which is accessible to solvent from the outside. • a cavity or void in a protein is a pocket which has no opening to the outside. It is an interior empty space inside the protein. Pockets and cavities can be critical features of proteins in terms of their binding behavior, and identifying them is usually a first step in structure-based ligand design etc.
Fractional accessibility • calculate total solvent accessible surface of protein structure (also can calculate solvent accessible surface for individual residues/sidechains within the protein) • can also model the accessible surface area in a disordered or unfolded protein using accessible surface area calculations on model tripeptides such as Ala-X-Ala or Gly-X-Gly. • from these we can calculate what fraction of the surface is buried (inaccessible to solvent) by virtue of being within the folded, native structure of the protein. • this is done by dividing the accessible surface area in the native protein structure by the accessible surface in the modelled unfolded protein. That’s the fractional accessibility. The residue fractional accessibility and side chain fractional accessibility refer to the same thing calculated for individual residues/sidechains within the structure.
Accessible surface area in globular protein structures Accessible surface areaAsin native states of proteins is a non-linear function of molecular weight (Miller, Janin, Lesk & Chothia, 1987): As = 6.3Mr0.73 ` whereMris molecular wt This is an empirical correlation but it comes close to the expected two-thirds power law relating surface area to volume or mass for a set of bodies of similar shape and density.
How much surface area is buried when a protein adopts its native structure in solution? • estimate total accessible surface area in extended/disorded polypeptide chain using the accessible surface areas in Gly-X-Gly or Ala-X-Ala models. This is a linear function of molecular weight At = 1.48Mr + 21 • the total fractional accessibility isAs/At,and the fraction of surface area buried is1- As /At • What is the total fractional surface area buried for a protein of molecular weight 10,000? 20,000? Is the fraction higher for small proteins or large?
Distribution of residue fractional accessibilities note that a sizeable group are completely buried (hatched) or nearly completely buried note broad distribution among non-buried residues, and mean fractional accessibility for non-buried residues of around 0.5 note that few residues are completely exposed to solvent, but that fractional accessibility of >1 is possible from Miller et al, 1987
Buried residues in proteins • the fraction of buried residues (defined by 0% or 5% ASA cutoffs) increases as a function of molecular weight--for your average protein around 25% of the residues will be buried. These form the core. size classmean Mrfraction of buried residues 0% ASA 5% ASA small 8000 0.070 0.154 medium 16000 0.107 0.240 large 25000 0.139 0.309 XL 34000 0.155 0.324 all 0.118 0.257
Residue fractional accessibility correlates with free energies of transfer for amino acids between water and organic solvents • (Miller, Janin, Lesk & Chothia, 1987) • (Fauchere & Pliska, 1983) • the interior of a protein is akin to a nonpolar solvent in which the nonpolar sidechains are buried. Polar sidechains, on the other hand, are usually on the surface. However, some polar side chains do get buried, and it must also be remembered that the backbone for every residue is polar, including those with nonpolar side chains. So a lot of polar moieties do get buried in proteins.
The hydrophobic core of a small protein: N15 Cro 0% ASA:Pro 3Leu 6Ala 16Val 27Ile 36Ile 44< 5 % ASA:Met 1Ala 17Val 20Gln 41Ser 54 note that some polar residues are buried 11 of 66 ordered residues have less than 5% ASA
The outer surface: water in protein structures Structures of water-soluble proteins determined at reasonably high resolution will be decorated on their outer surfaces with water molecules (cyan balls) with relatively well-defined positions, and waters may also occur internally Water is not just surrounding the protein--it is interacting with it
Water interacts with protein surfaces Most waters visible in crystal structures make hydrogen bonds to each other and/or to the protein, as donor/acceptor/both second shell water: only contacts other waters first shell waters: in contact with/ hydrogen bound to protein
DSSP Web Service http://mrs.cmbi.ru.nl/hsspsoap/
Solvent Accessibility Secondary Structure Amino Acids
STRIDE web service http://webclu.bio.wzw.tum.de/cgi-bin/stride/stridecgi.py
REM --------------- Detailed secondary structure assignment------------- 1L4W REM 1L4W REM |---Residue---| |--Structure--| |-Phi-| |-Psi-| |-Area-| 1L4W ASG ILE A 1 1 C Coil 360.00 168.01 69.6 1L4W ASG VAL A 2 2 E Strand -97.71 163.93 42.5 1L4W ASG CYS A 3 3 E Strand -164.52 149.74 1.4 1L4W ASG HIS A 4 4 E Strand -98.82 174.84 39.5 1L4W ASG THR A 5 5 E Strand -171.97 161.21 25.5 1L4W ASG THR A 6 6 E Strand -119.23 98.92 13.1 1L4W ASG ALA A 7 7 C Coil -159.51 -46.53 10.0 1L4W ASG THR A 8 8 T Turn -76.14 -145.16 41.5 1L4W ASG SER A 9 9 T Turn -67.19 -64.98 58.7 1L4W ASG PRO A 10 10 T Turn -98.83 -165.54 75.7 1L4W ASG ILE A 11 11 E Strand -63.95 136.61 71.6 1L4W ASG SER A 12 12 E Strand -95.58 151.90 4.8 1L4W ASG ALA A 13 13 E Strand -149.03 116.85 55.7 1L4W ASG VAL A 14 14 E Strand -140.58 165.04 77.2 1L4W ASG THR A 15 15 E Strand -95.72 140.63 82.1 1L4W ASG CYS A 16 16 C Coil -90.67 106.54 11.5 1L4W ASG PRO A 17 17 C Coil -62.41 -47.14 122.3 1L4W ASG PRO A 18 18 T Turn -71.40 -166.42 60.1 1L4W ASG GLY A 19 19 T Turn -69.07 -28.03 66.1 1L4W ASG GLU A 20 20 T Turn -76.00 94.17 91.2 1L4W ASG ASN A 21 21 T Turn -121.17 1.96 35.5 1L4W ASG LEU A 22 22 E Strand -69.97 133.22 51.5 1L4W ASG CYS A 23 23 E Strand -99.29 111.44 0.0 1L4W ASG TYR A 24 24 E Strand -96.27 149.93 62.1 1L4W ASG ARG A 25 25 E Strand -118.58 83.18 17.2 1L4W ASG LYS A 26 26 E Strand -78.88 139.08 32.1 1L4W ASG MET A 27 27 E Strand -156.68 130.00 34.7 1L4W ASG TRP A 28 28 E Strand -135.36 -157.76 57.9 1L4W ASG CYS A 29 29 E Strand -110.51 120.76 33.8 1L4W ASG ASP A 30 30 E Strand -140.95 83.38 68.8 1L4W ASG ALA A 31 31 B Bridge 96.09 -30.41 13.7 1L4W ASG PHE A 32 32 T Turn -64.73 -31.60 104.7 1L4W ASG CYS A 33 33 T Turn -76.46 -35.27 97.0 1L4W ASG SER A 34 34 T Turn -92.60 -74.82 109.8 1L4W ASG SER A 35 35 T Turn -142.87 -52.13 100.6 1L4W ASG ARG A 36 36 C Coil -73.80 -90.71 148.5 1L4W ASG GLY A 37 37 E Strand -161.56 -176.78 0.0 1L4W Avraham Samson - Faculty of Medicine - Bar Ilan University
Structure Analysis • Assign secondary structure for amino acids from 3D structure • Generate solvent accessible area for amino acids from 3D structure • Most widely used tool: DSSP (Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Kabsch and Sander, 1983)
2D: Contact Map Prediction 2D Contact Map 3D Structure 1 2 ………..………..…j...…………………..…n 1 2 3 . . . . i . . . . . . . n Distance Threshold = 8Ao Cheng, Randall, Sweredoski, Baldi. Nucleic Acid Research, 2005
3D Structure Prediction Tools • MULTICOM (http://sysbio.rnet.missouri.edu/multicom_toolbox/index.html ) • I-TASSER (http://zhang.bioinformatics.ku.edu/I-TASSER/) • HHpred (http://protevo.eb.tuebingen.mpg.de/toolkit/index.php?view=hhpred) • Robetta (http://robetta.bakerlab.org/) • 3D-Jury (http://bioinfo.pl/Meta/) • FFAS (http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl) • Pcons (http://pcons.net/) • Sparks (http://phyyz4.med.buffalo.edu/hzhou/anonymous-fold-sp3.html) • FUGUE (http://www-cryst.bioc.cam.ac.uk/%7Efugue/prfsearch.html) • FOLDpro (http://mine5.ics.uci.edu:1026/foldpro.html) • SAM (http://www.cse.ucsc.edu/research/compbio/sam.html) • Phyre (http://www.sbg.bio.ic.ac.uk/~phyre/) • 3D-PSSM (http://www.sbg.bio.ic.ac.uk/3dpssm/) • mGenThreader (http://bioinf.cs.ucl.ac.uk/psipred/psiform.html)