230 likes | 550 Views
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches. Gaurav Sahni, Ph.D. Structure alignment. Structure alignment may be defined as identification of residues occupying “equivalent” geometrical positions.
E N D
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
Structure alignment Structure alignment may be defined as identification of residues occupying “equivalent” geometrical positions • Unlike in sequence alignment, residue type is neglected • Used for • measuring the structural similarity • protein classification and functional analysis • database searches 2
Sequence and Structure Alignments Sequence alignment Structure alignment Based on residue identity, sometimes with a modified alphabet Based on geometrical equivalence of residue positions, residue type disregarded --AARNEDDDGKMPSTF-L E-AARNFG-DGK--STFIL Used for: • evolution studies • protein function analysis • guessing on structure similarity Used for: • protein function analysis • some aspects of evolution studies Algorithms: Dynamic programming + heuristics Applications: BLAST, FASTA, FLASH and others Algorithms: Dynamic programming, graph theory, MC, geometric hashing and others Applications: DALI, VAST, CE, MASS, SSM and others
Methods • Many methods are known: • Distance matrix alignment (DALI, Holm & Sander, EBI) • Vector alignment (VAST, Bryant et. al. NCBI) • Depth-first recursive search on SSEs (DEJAVU, Madsen & Kleywegt, Uppsala) • Combinatorial extension (CE, Shindyalov & Bourne, SDSC) • Dynamical programming on Ca (Gerstein & Levitt) • Dynamical programming on SSEs (SSA, Singh & Brutlag, Stanford University) • many more … • SSM employs a 2-step procedure: • Initial structure alignment and superposition using SSE graph matching • Ca - alignment
Three dimensional graph matching • Protein secondary structure elements (SSE)– natural and convenient objects for building three dimensional graphs. • Secondary structures provide most functionality and is conserved through evolution • Details of protein fold –expressed in terms of two SSE – helices and strands.
r2 a2 r1 a1 e L Graph representation of SSEs Vj Vi • SSE graphs- represented by vectors • Each SSE can be used as graph vertices (Ti, ρi) • Any 2 vertices are connected by an edge label L – describes position and orientation of the connected SSEs • Each edge labelled with a property vector – α1/2 angle between edge and vertices, torsion angle between vertices, length of the edge L
Sets of vertices, edges and their labels provides full definition of the graph. Graph matching algorithm is required – set of rules for comparing individual vertices and edges – tolerances chosen empirically Relative and absolute vertex and edge lengths are used for comparison – allows larger absolute differences for longer vertices and edges Torsion angle comparison – distinguish mirror symmetry mates r2 a2 r1 a1 e L
H1 A B H1 H2 S1 S1 S4 S2 H2 H1 S3 S2 S1 S3 H4 S2 S4 H1 H5 S5 S2 S3 S6 S1 S4 S7 H2 H3 S7 H2 H3 H6 S6 H4 S3 H5 S4 S5 H6 SSE graph matching A Matching the SSE graphs yields a correspondence between secondary structure elements, that is, groups of residues. The correspondence may be used as initial guess for structure superposition and alignment of individual residues. B
What next? • We have considered three dimensional arrangement of secondary structure element (SSE) regardless of their ordering in protein chain. • Connectivity of SSEs is significant (can be neglected in comparing mutated/engineered proteins) • In previous methods connectivity was either preserved or neglected.
PDBefold (SSM) Approach – a more flexible way There are three options – 1) connectivity of SSEs neglected Different connectivity in SSE but SSE graphs are geometrically identical
2) Soft connectivity – general order of SSEs along their protein chains are same in both structures BUT any number of missing/unmatched SSE between matched ones allowed 3)Strict connectivity – matched SSEs follow same order along their protein chains – separated only by equal number of matched/unmatched SSE in both structures To obtain 3D alignment of individual residues – represent them by their C-alpha atoms – use results of graph matching as a starting point
chain A matched helices matched strands chain B Ca - alignment • SSE-alignment is used as an initial guess for Ca-alignment • Ca-alignment is an iterative procedure based on the expansion of shortest contacts at best superposition of structures • Ca-alignment is a compromise between the alignment length Nalignand r.m.s.d. Longest contacts are unmapped in order to maximise the Q-score:
Multiple structure alignment • More than 2 structures are aligned simultaneously • Multiple alignment is not equal to the set of all-to-all pairwise alignments • Helps to identify common structure motifs for a whole family of structures
If you have to ask…. • Are there any structures in the PDB that are similar to mine? • What SCOP and/or CATH family could my structure belong to ? • Can I get some idea about the possible function of my protein based on similarity with others based on structural similarity ? • Mutiple alignment of many of my structures ? Use PDBefold. Upload your own PDB file for analysis !! 14 31.10.07 Macromolecular Structure Database
SSM output • Table of matched Secondary Structure Elements • Table of matched backbone Ca-atoms with distances between them at best structure superposition • Rotation-translation matrix of best structure superposition • Visualisation in Jmol and Rasmol • r.m.s.d. of Ca-alignment • Length of Ca-alignment Nalign • Number of gaps in Ca-alignment • Quality score Q • Statistical significance scores P(S), Z • Sequence identity
Conclusion • it is quite possible that residue identity plays a much less significant role in protein structure than often believed • as a consequence, the role of residue identity in protein function may be often overestimated • using sequence identity for the assessment of structural or functional features may give more false negatives than expected • physical-chemical properties of residues should be given preference over residue identity in structure and function analysis • modern methods for structure alignment are efficient; there is little sense to use sequence alignment in structure-related studies