1.58k likes | 1.79k Views
National Dong Hwa University, Taiwan. Graph-theoretic Approach for Protein Similarity Sheng-Lung Peng. Graph and Bioinformatics Algorithms Lab. Outline. Introduction Biological concept Protein structural alignment Methods Graph-theoretic approach Protein graph transformation Results
E N D
National Dong Hwa University, Taiwan Graph-theoretic Approach for Protein Similarity Sheng-Lung Peng Graph and Bioinformatics Algorithms Lab.
Outline • Introduction • Biological concept • Protein structural alignment • Methods • Graph-theoretic approach • Protein graph transformation • Results • Conclusion
Why Protein Structures? • Proteins are fundamental components of all living cells, performing a variety of biological tasks. • Proteins reflect millions of years of evolution. • Each protein has a particular 3D structure that determines its function. • Protein structure is more conserved than protein sequence, and more closely related to function.
Structure – Sequence Relationships • Two conserved sequences similar structures. • Two similar structures conserved sequences. ? There are cases of proteins with the same structure but no clear sequence similarity.
Protein Database • PDB: Protein Data Bank • Holds 3D models of biological macromolecules (protein, RNA, DNA). • All data are available to the public.
Protein Database • PDB: Protein Data Bank • Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%). • Submitted by biologists and biochemists from around the world.
Protein Database • PDB - model • A model defines the 3D positions of atoms in one or more molecules. • There are models of proteins, protein complexes, proteins and DNA, protein segments, etc. • The models also include the positions of ligand molecules, solvent molecules, metal ions, etc.
Protein Database http://www.pdb.org/pdb/home/home.do
Protein Database Residue identity The PDB file – textformat Atom identity The coordinates for each residue in the structure chain Atom number Residue number X Z Y ATOM: Usually protein or DNA HETATM: Usually Ligand, ion, water
Outline • Introduction • Biological concept • Protein structural alignment • Methods • Graph-theoretic approach • Protein graph transformation • Results • Conclusion
Structural Alignment • Why structural alignment? • Structural similarity can point to remote evolutionary relationship. • Shared structural motifs among proteins suggest similar biological function. • Getting insight into sequence-structure mapping (e.g., which parts of the protein structure are conserved among related organisms).
Structural Alignment • Intermolecular • Inter – methods which superpose protein structures and measure intermolecular distances. • Compare geometric “properties”, for example residue positions in 3D coordinate space.
Structural Alignment • Intramolecular • Intra – methods which compare intramolecular distances or vectors. • Align protein structures on the basis of information about internal ‘relationships’ within each protein.
Structural Alignment INTER INTRA
Structural Alignment • Global versus local alignment Global alignment
Structural Alignment • Global versus local alignment motif Local alignment
Structural Alignment • Find an optimal alignment
Structural Alignment What is the best transformation that superimposes the unicorn on the lion?
Structural Alignment • Solution Regard the shapes as sets of points and try to “match” these sets using a transformation.
Structural Alignment This is not a good result
Structural Alignment Good result
Structural Alignment • Kinds of transformations: • Translation • Rotation • Scaling • and more….
Structural Alignment Translation Y X
Structural Alignment Rotation Y X
Structural Alignment Scale Y X
x1, y1, z1 x2, y2, z2 x3, y3, z3 x1 + d, y1, z1 x2 + d, y2, z2 x3 + d, y3, z3 Structural Alignment Translation Scale Rotation
Structural Alignment • A protein can be represented as a geometric object in the plane. • The object consists of points represented by coordinates (x, y, z). Lys Thr Met Gly Glu Ala
Structural Alignment The problem: Given two proteins, find a transformation that produces a best superimposition of one protein onto the other.
Structural Alignment • Correspondence is Unknown • Given two configurations of points in the three dimensional space. +
Structural Alignment • Correspondence is Unknown • Find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3D points.
Structural Alignment The best transformation
Structural Alignment • Simple case • Two closely related proteins with the same number of amino acids.
Structural Alignment • Methods to superimpose structures • Translate both proteins to a common position in the coordinate frame of reference. • Rotate one protein relative to the other protein, around the three major axes. • Measure the distances between equivalent positions in the two proteins.
Structural Alignment • Methods to superimpose structures • Steps 2 and 3 are repeated until there is a convergence on a minimum separation between the superposed structures.
Structural Alignment • Scoring system to find an optimal alignment • Answer: Root Mean Square Deviation. • n = number of atoms. • di= distance between 2 corresponding atoms iin 2 structures.
Structural Alignment 3 4 1 5 2 1 2 3 4 5
Structural Alignment • Scoring system to find optimal alignment • Find a 3D transformation T* such that:
Structural Alignment • Optimal Alignment • Find the highest number of atoms aligned with the lowest RMSD. • Find a balance between local regions with very good alignments and overall alignment.
Structural Alignment • Unit of RMSD, e.g., Ångstroms • Identical structures → RMSD = 0. • Similar structures → RMSD is small (1 ~ 3 Å). • Distant structures → RMSD > 3 Å.
Structural Alignment • Pitfalls of RMSD • All atoms are treated equally. • (e.g., residues on the surface have a higher degree of freedom than those in the core) • Best alignment does not always mean minimal RMSD. • Does not take into account the attributes of the amino acids.
Outline • Introduction • Biological concept • Protein structural alignment • Methods • Graph-theoretic approach • Protein graph transformation • Results • Conclusion
Graph Matching Graph Theory Computer Science Computational Biology
Notions of Graph Matching Isomorphism Identifying a bijection relation between two graphs
Notions of Graph Matching Isomorphism Identifying a bijection relation between two graphs