370 likes | 810 Views
Secondary Structure Assignment from Structure . PHAR 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD Reading Chapter 19 Structural Bioinformatics. Agenda. Why secondary structure assignment is important Hydrogen bonding models DSSP (Kabsch-Sander) and its impact
E N D
Secondary Structure Assignment from Structure PHAR 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD Reading Chapter 19 Structural Bioinformatics PHAR 201 Lecture 05, 2012
Agenda • Why secondary structure assignment is important • Hydrogen bonding models • DSSP (Kabsch-Sander) and its impact • Other methods • Conclusions PHAR 201 Lecture 05, 2012
From http://www.imb-jena.de phi - dihedral angle about the N-Calpha bond psi - dihedral angle about the Calpha-C bond omega - dihedral angle about the C-N (peptide) bond Reminder - Dihedral Angles PHAR 201 Lecture 05, 2012
Reminder - Helices phi(deg) psi(deg) H-bond pattern ------------------------------------------------------------------ right-handed alpha-helix -57.8 -47.0 i+4 pi-helix -57.1 -69.7 i+5 310 helix -74.0 -4.0 i+3 (omega is ~180 deg in all cases) ----------------------------------------------------------------- From http://www.imb-jena.de PHAR 201 Lecture 05, 2012
phi(deg) psi(deg) omega (deg) ------------------------------------------------------------------ beta strand -120 120 180 ----------------------------------------------------------------- From http://broccoli.mfn.ki.se/pps_course_96/ Reminder - Beta Strands Hydrogen bond patterns in beta sheets. Here a four-stranded beta sheet is drawn schematically which contains three antiparallel and one parallel strand. Hydrogen bonds are indicated with red lines (antiparallel strands) and green lines (parallel strands) connecting the hydrogen and receptor oxygen. PHAR 201 Lecture 05, 2012
Why is consistent secondary structure assignment from structure important? • Part of the fold and domain • Useful conceptualization for understanding structure • Influences the sequence alignment • It is related to function • It is useful as part of structure prediction – defines regions on the templates • As a training set in machine learning algorithms • Consistency of searching – author’s assignments differ PHAR 201 Lecture 05, 2012
150 200 Ilk____PSS .......... .......... ........CC ....CEEEHH HHCCCCCCEE Ilk____Seq .......... .......... ........FK ....QLNFLT KLNENHSGEL ------------ -+ +L-+++ KL-+---GE- 1fmk--_Seq KHADGLCHRL TTVCPTSKPQ TQGLAKDAWE IPRESLRLEV KLGQGCFGEV 1fmk--_SS HCCCCCCCCC CEECCCCCCC CCCCCCCCCE CCHHHEEEEE EEEECCCEEE * * * 200 250 Ilk____PSS EEEECCCCE. EEEEEEECCC CCCCCHHHHH HHHHHHHHHC CCCEEEEEEE Ilk____Seq WKGRWQGND. IVVKVLKVRD WSTRKSRDFN EECPRLRIFS HPNVLPVLGA ------------ W+G+W-G+- +-+K+LK- +T+++-+F- +E---++-++ H++++-++++ 1fmk--_Seq WMGTWNGTTR VAIKTLKP.. .GTMSPEAFL QEAQVMKKLR HEKLVQLYAV 1fmk--_SS EEEEECCCEE EEEEEECC.. .CCCCHHHHH HHHHHHHHCC CCCECCEEEE * * 250 300 Ilk____PSS EECCCCEEEE EEHHHHCCCC HHHHHHCCCC CCCCHHHHHH HHHHHHHHHH Ilk____Seq CQSPPAPHPT LITHWMPYGS LYNVLHEGTN FVVDQSQAVK FALDMARGMA ------------ ++++P -- ++T--M++GS L-++L-+-T+ --+--+Q-V+ +A+++A+GMA 1fmk--_Seq VSEEP...IY IVTEYMSKGS LLDFLKGETG KYLRLPQLVD MAAQIASGMA 1fmk--_SS ECCCC...EE EEEECCCCCE HHHHHCCCCC CCCCHHHHHH HHHHHHHHHH 300 350 Ilk____PSS HHHCCCCCEE CCCCCCCCEE ECCCCEEEEC CCCCEEECCC CCCCCCCCCC Ilk____Seq FLHTLEPLIP RHALNSRSVM IDEDMTARIS MADVKFSFQC PGRMYAPAWV ------------ ++++--- - ---L-+++++ ++E+-+++++ ---+-- +---W- 1fmk--_Seq YVERMNY..V HRDLRAANIL VGENLVCKVA DFGLAR.... ....FPIKWT 1fmk--_SS HHHHHCC..C CCCCCHHHEE EECCCEEEEC CCCCCC.... ....CCHHHC * * * Cat. Loop 350 400 Ilk____PSS HHHHHHCCCC CCCCEEEEEE EEHHHHHHHH H.CCCCCCCC CHHHHHHHHH Ilk____Seq APEALQKKPE DTNRRSADMW SFAVLLWELV T.REVPFADL SNMEIGMKVA ------------ APEA++++- ---++D+W SF++LL+EL+ T -+VP+-++ +N-E+-++V 1fmk--_Seq APEAALYGR. ..FTIKSDVW SFGILLTELT TKGRVPYPGM VNREVLDQV. 1fmk--_SS CHHHHHHCC. ..CCHHHHHH HHHHHHHHHH CCCCCCCCCC CHHHHHHHH. *** Example where secondary structure is important • “Integrin-linked kinase” (Ilk) is a novel protein kinase fold with strong sequence similarity to known structures (Hannigan et al. 1996 Nature 379, 91-96) • Aligns to Src kinases with BLAST e-value of 10-19 and 27% identity (alignment shown is to a known Src kinase structure) • Several key residues are conserved, but residues important to catalysis, including catalytic Asp, are missing • Recent experimental evidence suggests that Ilk lacks kinase activity (Lynch et al. 1999 Oncogene 18, 8024-8032) PHAR 201 Lecture 05, 2012
History of Assignment • Originally left to the interpretation of the structural biologist – inconsistent • 1983 - the Kabsch- Sander algorithm was written as an aid in secondary structure prediction – the program as such never emerged – what did emerge is perhaps the most consistent and accepted algorithm in all of structural bioinformatics • Assignments are embodies in the DSSP algorithm and associated database of assignments PHAR 201 Lecture 05, 2012
Inconsistent Author Assignment PHAR 201 Lecture 05, 2012
Hydrogen Bonding is Key to Automated Methods • Why? - ~90% of backbone donors (NH) and acceptors (C=O) form hydrogen bonds • 62% are intra-backbone • Basic definition • Angle N – (H) – O greater than 120 degrees • H …O less than 2.5A • Note H’s not usually identified directly PHAR 201 Lecture 05, 2012
Hydrogen Bond - Definition PHAR 201 Lecture 05, 2012
Coulomb Hydrogen Bond Calculation – used by DSSP • f is a constant 332 Å kcal/e2 • Delta is the + and – polar charge in electrons • Weakest H-bond –0.5 kcal/mole in DSSP • H not given – requires extrapolation – note assumes planar geometry for peptide bond PHAR 201 Lecture 05, 2012
DSSP – Dictionary of Secondary Structures of Proteins • Defined solely based on the H-bonds given – from the list of bonds and residues that form them; helix assignments are made as follows: • Alpha helix (H): start i -> i+4; end i-4 -> i • 310 helix (G): start i -> i+3; end i-3 -> I • Pi helix (I): start i -> i+5 PHAR 201 Lecture 05, 2012
DSSP – Dictionary of Secondary Structures of Proteins • Similarly for beta sheets: • Residues (E) – have 2 H-bonds in the sheet or are surrounded by 2 H-bonds • Isolated residues (B) beta bridge 1GCS • Beta bulges also assigned E – may exist as up to 4 on one side of sheet and 1 on the other PHAR 201 Lecture 05, 2012
DSSP Nomenclature • H – alpha helix • G = 310 helix • I = Pi helix • B = bridge – single residue sheet • E = extended beta strand • T = beta turn (example) • S = bend • C = coil PHAR 201 Lecture 05, 2012
Converse Situation? • In our discussions of structure comparison and alignment, structure classification and (soon) domain assignment we learnt there was not one generally accepted method • DSSP has for a long time been a generally accepted method PHAR 201 Lecture 05, 2012
DSSP as Implemented in the PDB 1ATP PHAR 201 Lecture 05, 2012
STRIDE – Empirical Hydrogen Bond Calculation • Derived from small molecule structures rm (3.0A) and Em (-2.8kcal/mole) • Total energy Ehb PHAR 201 Lecture 05, 2012
STRIDE – Empirical Hydrogen Bond Calculation • Uses Ehb and phi-psi torsional angle criteria • Torsional angles define secondary structures according to the regions of the Ramachandran plot in which they fall • E is ignored if phi and psi are unfavorable PHAR 201 Lecture 05, 2012
Comparison DSSP & STRIDE PHAR 201 Lecture 05, 2012
DSSP vs STRIDE • Stride – added term in the expression of hydrogen bond energy • Stride - Selection of terminal residues through reliance on torsional angles • Stride – stresses planarity of hydrogen bonds while allowing longer bonds PHAR 201 Lecture 05, 2012
Other Methods • DEFINE – uses a distance criteria between Calpha atoms which varies slightly for each secondary structure type; allows modifications for curvature • P-Curve – analysis of protein curvature – compares to ideal motifs – unknown motif defined by tilt, roll etc between peptide planes. PHAR 201 Lecture 05, 2012
Comparative Notes • The last residues of a sheet or a helix are often still in the same conformation, although they no longer have hydrogen bonds in the structure. This translates to the observation that ends (caps) of regular secondary structure segments are not well defined. • It seems that Ca-distance criteria (applied in DEFINE) alone can accommodate considerable distortion of the backbone, giving an excess of secondary structure assignments despite having reduced e considerably. • DSSP is the only assignment scheme with a large peak for a-helices of four residues, many of which constitute single helical turns. • DEFINE assigns more than twice as many sheets of length four than the other methods. • P-Curve has a tendency to assign overly long elements of regular secondary structure. PHAR 201 Lecture 05, 2012
Amino Acid Propensities Indicate the Role of Side Chains in Defining Secondary Structure – Basis of Prediction Methods – Note that none of the assignment methods use this • Alpha helices – rich in ALA, LEU; poor in PRO and GLY • Beta sheets – rich in VAL, ILE; poor in GLY, ASP, PRO • 310 – rich in PRO; poor in ALA, LEU • Beta bridges – poor in VAL, ILE PHAR 201 Lecture 05, 2012
Newer Methods DSSPcont • Use known alignments from multiple 3D structures or from multiple members of the NMR ensemble (DSSPcont) • Consensus based approach PHAR 201 Lecture 05, 2012
Supersecondary Structures http://en.wikipedia.org/wiki/Meander_(art) http://en.wikipedia.org/wiki/Zinc_finger Zinc Finger Motif PHAR 201 Lecture 05, 2012
I-sites (Baker) • I-sites – specific segments with common amino acid propensities • Used by Rosetta to predict structure – perhaps the most successful method thus far • Note considers only main chain hydrogen bonds – much of the tertiary structure is associated with side chain interactions PHAR 201 Lecture 05, 2012
Summary • DSSP remains the first and most popular approach • STRIDE may have been developed as part of the EMBL …. • DSSP has been coded a number of times from the paper often with different results – open source helps this today • DSSP is perhaps the most accepted algorithm in all of structural bioinformatics • It is not always clear whether the secondary structure assignments deposited with a structure are from DSSP or from the authors view • Consistent searching requires that DSSP be used for all structures – early structures had no author assignments PHAR 201 Lecture 05, 2012