480 likes | 1.5k Views
Protein Structure, Structure Classification and Prediction. Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala University. Overview. Introduction to proteins, structure & classification Protein Folding
E N D
Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala University
Overview • Introduction to proteins, structure & classification • Protein Folding • Experimental techniques for structure determination • Structure prediction
Proteins • Proteins play a crucial role in virtually all biological processes with a broad range of functions. • The activity of an enzyme or the function of a protein is governed by the three-dimensional structure
Hydrophilic or hydrophobic..? • Virtually all soluble proteins feature a hydrophobic core surrounded by a hydrophilic surface • But, peptide backbone is inherently polar ? • Solution ; neutralize potential H-donors & acceptors using ordered secondary structure
Secondary Structure:a-helix • 3.6 residues / turn • Axial dipole moment • Not Proline & Glycine • Protein surfaces
Secondary Structure:b-sheets • Parallel or antiparallel • Alternating side-chains • No mixing • Loops often have polar amino acids
Structural classification • Databases • SCOP, ’Structural Classification of Proteins’, manual classification • CATH, ’Class Architecture Topology Homology’, based on the SSAP algorithm • FSSP, ’Family of Structurally Similar Proteins’, based on the DALI algorithm • PClass, ’Protein Classification’ based on the LOCK and 3Dsearch algorithms
Structural classification, CATH • Class, four types : • Mainly a • a/ b structures • Mainly b • No secondary structure • Arhitecture (fold) • Topology (superfamily) • Homology (family)
Structural classification.. • Two types of algorithms • Inter-Molecular, 3D, Rigid Body ; structural alignment in a common coordinate system (hard) e.g. VAST, LOCK.. alg. • Intra-Molecular, 2D, Internal Geometry ; structural alignment using internal distances and angles e.g. DALI,STRUCTURAL, SSAP.. alg.
Structural classification, SSAP • SSAP, ‘Sequential Structure Alignment Program’ Basic idea ; The similarity between residue i in molecule A and residue k in molecule B is characterised in terms of their structural surroundings This similarity can be quantified into a score, Sik Based on this similarity score and some specified gap penalty, dynamic programming is used to find the optimal structural alignment
Structural classification, SSAP The structural neighborhood of residue i in A compared to residue k in B i k
Structural classification, SSAP.. Distance between residue i & j in molecule A ; dAi,j Similarity for two pairs of residues, ij in A & kl in B ; a,b constants Similarity between residue i in A and residue k in B ; Idea ; Si,k is big if the distances from residue i in A to the 2n nearest neighbours are similar to the corresponding distances around k in B
i=5 HSERAHVFIM.. GQ-VMAC-NW.. A : B : k=4 Structural classification, SSAP.. This works well for small structures and local structural alignments - however, insertions and deletions cause problems unrelated distances - The real algorithm uses Dynamic programming on two levels, first to find which distances to compare Sik, then to align the structures using these scores
Experimental techniques for structure determination • X-ray Crystallography • Nuclear Magnetic Resonance spectroscopy (NMR) • Electron Microscopy/Diffraction • Free electron lasers ?
X-ray Crystallography.. • From small molecules to viruses • Information about the positions of individual atoms • Limited information about dynamics • Requires crystals
NMR • Limited to molecules up to ~50kDa (good quality up to 30 kDa) • Distances between pairs of hydrogen atoms • Lots of information about dynamics • Requires soluble, non-aggregating material • Assignment problem
Electron Microscopy/ Diffraction • Low to medium resolution • Limited information about dynamics • Can use very small crystals (nm range) • Can be used for very large molecules and complexes
Structure Prediction ? GPSRYIV…
Protein Folding • Different sequence Different structure • Free energy difference small due to large entropy decrease, DG = DH - TDS
Structure Prediction • Why is structure prediction and especially ab initio calculations hard..? • Many degrees of freedom / residue • Remote noncovalent interactions • Nature does not go through all conformations • Folding assisted by enzymes & chaperones
Molecular dynamics Ab initio calculations used for smaller problems ; • Calculation of affinity • Enzymatic pathways
Sequence Classification rev. • Class : Secondary structure content • Fold : Major structural similarity. • Superfamily : Probable common evolutionary origin. • Family : Clear evolutionary relationship.
Structure Prediction • Search sequence data banks for homologs • Search methods e.g. BLAST, PSIBLAST, FASTA… • Homologue in PDB..? IVTY…PGGG HYW…QHG
Structure Prediction Multiple sequence / structure alignment • Contains more information than a single sequence for applications like homology modeling and secondary structure prediction • Gives location of conserved parts and residues likely to be buried in the protein core or exposed to solvent
HFD fingerprint Multiple alignment example
Secondary Structure Prediction • Statistical Analysis (old fashioned): • For each amino acid type assign it’s ‘propensity’ to be in a helix, sheet, or coil. • Limited accuracy ~55-60%. • Random prediction ~38%. MTLLALGINHKTAP... CCEEEEEECCCCCC...
The Chou & Fasman Method • Each residue is classified as: • H/H, strong helix / strand former. • h/h, weak helix / strand former. • I, indifferent. • b/b, weak helix/strand breaker. • B/B, strong helix / strand breaker.
The Chou & Fasman Method.. • Score each residue: • H/h=1, I=0 or ½, B/b=-1. • H/h=1, I=0 or ½, B/b=-1. • Helix nucleation: • Score > 4 in a “window” of 6 residues. • Strand nucleation: • Score > 3 in a “window” of 5 residues. • Propagate until score < 1 in a 4 residue “window”.
The Chou & Fasman Method.. GPSRYIVTLANGK -1 -1 0 0 -1 1 1 0 1 1 -1 -1 1 Helix: No nucl. -2 0 1 2 3 3 1 -1 -1 -1 .5 1 1 1 1 1 0 0 -1 -1 Strand Nucleation -1.5 .5 2.5 4.5 54 3 1 -1 -2.5 -.5 1.5 … 3 1 -1 Propagate GPSRYIVTLANGK Result
Modern methods • Neural networks (e.g. the PHD server): • Input: a number of protein sequences + secondary structure. • Output: a trained network that predicts secondary structure elements with ~70% accuracy. • Use many different methods and compare (e.g. the JPred server)!
Summary • Thefunction of a protein is governed by its structure • Different sequence Different structure • PDB, protein data bank • Secondary structure prediction is hard, tertiary structure prediction is even harder • Use homologs whenever possible or different methods to assess quality