590 likes | 713 Views
Structural alignment of proteins by a novel TOPOFIT method, as a superposition of common volumes at a topomax point . V.A.Ilyin , A.Arbuzov and C.M.Leslin Protein Science (2004), 13: 1865-1874.
E N D
Structural alignment of proteins by a novel TOPOFIT method, as a superposition of common volumes at a topomax point V.A.Ilyin, A.Arbuzov and C.M.Leslin Protein Science (2004), 13: 1865-1874
Structural alignment of proteins by a novel TOPOFIT method, as a superposition of common volumes at a topomax point Presented by Svetlana Kurilova
Outline • Structural features of protein • Importance of protein comparison • Sequence alignment vs Structural alignment • Brief overview of existing algorithms • TOPOFIT method • Main criteria • Voronoiand Delaunay triangulation • Results
The Different levels of Protein Structure • Primary: amino acid linear sequence • Secondary: -helices, β-sheets and loops • Tertiary: the 3D shape of the fully folded polypeptide chain
Beta-Sheet Anti-Parallel Parallel
Non-Repetitive Secondary Structure Loop Beta-Turn
Sequence, Structure and Function AGCWY…… Cell
Complication:„inexact“ is not binary (1|0) but something relative Amino acids have different physical and biochemical properties that are/are not important for function and thus influence their probability to be replaced in evolution
X-Ray Crystallography A protein crystal Mount a crystal Diffractometer Diffraction Protein structure
Protein Structure Determination • X-ray crystallography • any size, accurate (1-3 Angstrom (10-10m)), sometime hard to grow crystal • Nuclear Magnetic Resonance (NMR) Spectroscopy • small to medium size, moderate accuracy, structure in solution
Storage in Protein Data Bank Search database
Predicting 3D Structure Outstanding difficult problem • Based on sequence homology • Comparative modeling (homology) • Based on structural homology • Fold recognition (threading)
Basics in sequence comparison Identity • The extent to which two (nucleotide or amino acid) sequences are invariant (identical). Similarity • The extent to which (nucleotide or amino acid) sequences are related. The extent of similarity between two sequences can be based on percent sequence identity and/or conservation. In BLAST similarity refers to a positive matrix score. This is quite flexible (see later examples of DNA polymerases) – similar across the whole sequence or similarity restricted to domains ! Homology • Similarity attributed to descent from a common ancestor.
Alignment of Protein Structure • Compare 3D structure of one protein against 3D structure of second protein • Compare positions of atoms in three-dimensional structures • Look for positions of secondary structural elements (helices and strands) within a protein domain • Exam distances between carbon atoms to determine degree structures may be superimposed • Side chain information can be incorporated • Buried; visible • Structural similarity between proteins does not necessarily mean evolutionary relationship
T Find a transformation to achieve the bestsuperposition Structure alignment • Simple case – two closely related proteins with the same number of amino acids.
Transformations • Translation • Translation and Rotation • Rigid Motion (Euclidian space)
Types of Structure Comparison • Sequence-dependent vs. sequence-independent structural alignment • Global vs. local structural alignment • Pairwise vs. multiple structural alignment
2 2 6 6 5 5 7 7 1 1 3 3 4 4 2 2 1 1 4 4 5 5 7 7 3 3 6 6 Sequence-dependent Structure Comparison 1234567 ASCRKLE ¦¦¦¦¦¦¦ ASCRKLE Minimizermsd of distances1-1,...,7-7
Sequence-dependent Structure Comparison • Useful in comparing structures of the same protein solved in different methods, under different conformation, through dynamics. • Evaluation protein structure prediction.
T Sequence-independent Structure Comparison • Given two configurations of points in the three dimensional space: find T which produces “largest” superimpositions of corresponding 3-D points.
Aligning protein structures • Important step for understanding protein functions • Sequencing proteins and determining3D structures is easy • X-ray crystallography, NMR spectroscopy • Testing functions of proteins is hard • One useful observation • Mutations change sequences • Structures conserved • Structural similarity => Functional similarity • Good structural alignment algorithm => Predict functions of proteins
3D structure and function Goal of structure prediction • Epstein & Anfinsen, 1961: sequence uniquely determines structure • INPUT: sequence • OUTPUT: B. Rost, 2005
Similar folds usually mean similar function Homeodomain Transcription factors
The same fold can have multiple functions Rossmann 12 functions TIM barrel 31 functions
Structure Comparison (Alignment) • Are the structures of two protein similar? • Are the two structure models of the same protein similar? • Different measures • RMSD, GDT-TS [1] • MaxSub [2] • TM score [3] Zemla et al., 1999; Siew et al., 2000; Zhang and Skolnick,2005
Structure Alignments • There are many different algorithms for structural Alignment • The outputs of a structural alignment are a superposition of the atomic coordinates and a minimal Root Mean Square Distance (RMSD) between the structures. The RMSD of two aligned structures indicates their divergence from one another. Low values of RMSD mean similar structures
Sequence Identity and Alignment Quality in Structure Prediction Superimpose -> RMSD %Sequence Identity: percent of identical residues in alignment RMSD: square root of average distance between predicted structureandnative structure
Homology Threshold for Different Alignment Lengths • A sequence alignment between two proteins is considered to imply structural homology if the sequence identity is equal to or above the homology threshold t in a sequence region of a given length L Homology Threshold(t) Alignment length (L) The threshold values t(L) are derived from PDB
Structure Analysis • Assign secondary structure for amino acids from 3D structure • Generate solvent accessible area for amino acids from 3D structure • Most widely used tool: DSSP (Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features) Kabsch and Sander, 1983
Dali (Distance mAtrixaLIgnment) • DALI offers pairwise alignments of protein structures. • The algorithm uses the three-dimensional coordinates of each protein to calculate distance matrices comparing residues. Holm L and Sander C (1993) J Mol Biol 233:123-138
Fold classification based on structure-structure alignment of proteins (FSSP) • FSSP is based on a comprehensive comparison of PDB proteins (greater than 30 amino acids in length) using DALI • Representative sets exclude sequence homologs sharing > 25% amino acid identity. http://www.ebi.ac.uk/dali/fssp Page 293
Comparative Modeling Similar sequence suggests similar structure • Comparative structure prediction produces an all atom model of a sequence, based on its alignment to one or more related protein structures in the database • Similarity particularly high in core • Alpha helices and beta sheets preserved • Even near-identical sequences vary in loops
Protein Folds • A combination of secondary structural units • Forms basic level of classification • Each protein family belongs to a fold • Different sequences can share similar folds
SCOP Structure Classification Of Proteins • Fold classification: • Class: • All alpha • All beta • Alpha/beta • Alpha+beta • Fold • Superfamily • Family
Fold Recognition • Methods of protein fold recognition attempt to detect similarities between protein 3D structure that have nosignificant sequence similarity. • Search for folds that are compatible with a particular sequence. • "the turn the protein folding problem on it's head” rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence
Evaluating Structural Alignments • Number of amino acid correspondences created • RMSD of corresponding amino acids • Percent identity in aligned residues • Number of gaps introduced • Size of the two proteins • Conservation of known active site environments • … • No universally agreed upon criteria. It depends on what you are using the alignment for.
TOPOFIT method • Two criteria for the similarity: • root mean square deviation (RMSD) • number of equivalent positions (Ne) • Goal to find a proper balance between: • lower RMSD • larger number of aligned positions
Voronoi Diagram • To find the nearest neighbors of points in the plane p2 p1 L12
The Voronoi diagram problem • The Voronoi diagram for three points Each Lij is perpendicular bisector of the line connecting the pair of points
Delaunay triangulation algorithm Principle: Incremental algorithm, one vertex is added at a time. Initialization: 3 points-> Unique triangulation, it is a Delaunay triangulation.
Construction of an optimum triangulation Triangulation: Process to mesh the convex hull of a set of points in the plane with triangles.
Introduction and definitions Delaunay triangulation: All the circumcircles of all the triangles are empty. It maximizes the minimum angle of all the angles of the triangles in the triangulation.
The Voronoi diagram problem • A Delaunay triangulation:
Good and bad triangulation Planless triangulation: Very long triangles with small angles exist Here: “as equilateral as possible” or “maximize smallest angle” Curso Caracas, 2006
VoronoidiagramsvsDelaunaytriangulation? They are so called dual graphs in mathmatical graph theory. That means: Both hold the same information content! Curso Caracas, 2006