330 likes | 466 Views
Topics in bioinformatics CS697. Spring 2011 Class 12 – Mar-22-2011 Molecular distance measurements Molecular transformations. Rotation in 3D - matrices. Rotations in 3D – Euler angles. * Rotate the XYZ-system about the Z-axis by α. The X-axis now lies on the line of nodes.
E N D
Topics in bioinformaticsCS697 Spring 2011 Class 12 – Mar-22-2011 Molecular distance measurements Molecular transformations
Rotations in 3D – Euler angles * Rotate the XYZ-system about the Z-axis by α. The X-axis now lies on the line of nodes. * Rotate the XYZ-system again about the now rotated X-axis by β. The Z-axis is now in its final orientation, and the x-axis remains on the line of nodes. * Rotate the XYZ-system a third time about the new Z-axis by γ.
Quaternions Extension of complex numbers. h=a+bi+cj+dk, a,b,c,d real numbers. i,j,k : imaginary components s.t.: i2=j2=k2=-1 ij=k, jk=i, ki=j ij = -ji, jk = -kj, ki = -ik
Unit quaternions • Unit quaternions : • Group under quaternion multiplication. • can be mapped to:
Some resources http://mathworld.wolfram.com/RotationMatrix.html http://mathworld.wolfram.com/EulerAngles.html http://mathworld.wolfram.com/Quaternion.html
Measuring protein structure similarity Given two “shapes” or structures A and B, we are interested in defining a distance, or similarity measure between A and B. • Visual comparison • Dihedral angle comparison • Distance matrix • RMSD (root mean square distance) Is the resulting distance (similarity measure) D a metric? D(A,B) ≤ D(A,C) + D(C,B)
Comparing dihedral angles Torsion angles () are: - local by nature - invariant upon rotation and translation of the molecule - compact (O(n) angles for a protein of n residues) But… Add 1 degree To all
Internal distance matrix 5.9 2 4 8.1 3 6.0 1
Internal distance matrix (2) • Advantages - invariant with respect to rotation and translation - can be used to compare proteins • Disadvantages - the distance matrix is O(n2) for a protein with n residues - comparing distance matrix is a hard problem - insensitive to chirality
Quality Assessment through RMSD • RMSD: Root Mean Squared Deviation • one of the simplest measures to quantify how different two protein conformations really are • advantage: simple to compute by representing conformations as 3N vectors (N atoms) • limitations: 1) limited to conformations of the same protein chain 2) atom-atom correspondence needed on different-length chains 3) not very descriptive if changes are localized • RMSD: Average atomic distance • given two conformations of a chain of N atoms • represent the conformations as two 3N vectors x and y • RMSD(x,y) is the euclidean distance between x and y, averaged over the N atoms • lRMSD: least RMSD • same conformation rigidly transformed in space (translated or rotated) should give an RMSD of 0 • before computing RMSD(x,y) one needs to remove changes due to rigid-body transformations
Protein Structure Superposition A rigid-body transformation T is a combination of a translation t and a rotation R: T(x) = Rx+t The quantity to be minimized is: Where a and b are the two point sets.
The translation part E is minimum with respect to t when: Then: If both data sets A and B have been centered on 0, then t = 0 ! Step 1: Translate point sets A and B such that their centroids coincide at the origin of the framework
The rotation part Let A and B be the centroids of A' and B', and A and B the matrices containing the coordinates of the points of A and B centered on 0: Build covariance matrix: Nx3 3xN 3x3 x =
The rotation part Compute SVD (Singular Value Decomposition) of C: U and V are orthogonal matrices, and D is a diagonal matrix containing the singular values. U, V and D are 3x3 matrices Define S by: Then
The algorithm 1. Center the two point sets A and B 4. Define S: 2. Build covariance matrix: 5. Compute rotation matrix 3. Compute SVD (Singular Value Decomposition) of C: 6. Compute lRMSD: O(N) in time!
http://cnx.org/content/m11608/latest/#RMSD Some reading material
lRMSD has Shortcomings • lRMSD cannot capture localized changes: if a small perturbation occurs in a part of the structure, e.g. rotation of a hinge connecting two domains, lRMSD will report a large value • Main reason: lRMSD does not know how to attribute changes to specific atoms of the chain • lRMSD distributes change equally (through the averaging) to all atoms in a protein chain • Measuring conformational similarity is an active research area
Other Quality Assessment: Shape Similarity • Sometimes assessment of cavities on the surface of a protein is more important than description of the rest of the structure, especially when the goal is prediction of a binding site rather than of the entire structure (which can be thought of as a scaffold) • Methods that assess surface area, solvent accessible surface area, that compute volumes, and detect cavities on proteins are very important in the context of binding and docking Model each atom as a vdw sphere, the union of which gives the molecular surface Not all molecular surface is accessible to solvent. Rolling a solvent ball over the vdw spheres traces out the solvent accessible surface area (SASA) . SASA is important to quantitatively determine interactions of the protein
Solvent-accessible Solvent Area (SASA) • Computational geometry methods that use Delaunay triangulations and alpha shapes assess SASA and other geometric descriptors of molecular surfaces, volumes, and cavities • We will come back to this topic in the context of molecular docking – further reading about shape computing at http://cnx.org/content/m11616/latest/ SASA for a 1.4 Å ball SASA for a 1.5 Å ball. Increasing the radius reduces the SASA due to more cavities that a bulkier ball cannot penetrate
Ultrafast Shape Recognition (USR) Drug design – Screening a number of potential compounds. Find a set of molecules which closely resemble a lead molecule from a HUGE database. Shape similarity may indicate similar binding properties and similar activity.
Overview Efficient global comparison of molecular shapes. The molecules are represented as feature vectors, representing the relative positions of the atoms. Does not require alignment of the molecules. Suitable for large database search.
Feature vector representation The shape of a molecule is uniquely determined by the relative positions of the atoms. … Which are determined by the inter-atomic distances. The set of distances can be constrained due to forces that hold the atoms together.
Strategic feature points The molecule is described as 4 sets of atomic distance distributions from feature points: Center of mass - ctd Point closest to ctd - cst Point farthest from ctd – fct Point farthest from fct – ftf The moments of the distributions are calculated and stored as a feature vector. Estimate of the size, compactness and symmetry of the molecule.
Advantages and disadvantages Extremely fast due to calculation of only 4N distances and distributions. Very sensitive to small changes in the molecule shape. Does not directly account for chemical interactions and atom types.
Quality Assessment through LGA • Local-Global-Alignment (LGA) introduced by Adam Zemla in 2003 is being used as a more accurate similarity assessment than lRMSD in CASP • LGA generates many different local superpositions to find regions where two conformations are similar: combines longest continuous segment (LCS) and global distance test (GDT) to find local and global similarities • LCS superimposes the longest segments that fit under a selected RMSD cutoff • GDT complements evaluations made with LCS searching for the largest (not necessary continuous) set of ‘equivalent’ residues that deviate by no more than a specified distance cutoff • Further reading: Zemla A., "LGA - a Method for Finding 3D Similarities in Protein Structures", Nucleic Acids Research, 2003, Vol. 31, No. 13, pp. 3370-3374 http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=12824330