470 likes | 569 Views
The Effect of Finite Sampling on the Determination of Orientational Properties. Nick Patrick Computer Science 260 Duke University March 6, 2008. Motivation. Problem:
E N D
The Effect of Finite Sampling on the Determination of Orientational Properties Nick Patrick Computer Science 260 Duke University March 6, 2008
Motivation • Problem: • Determine Saupe alignment tensor S from a set of RDCs and corresponding interatomic bond vectors (e.g. NH bond vectors) using SVD • Generalized problem: • Determine a second-rank tensor D from interatomic bond vectors and other experimental data • How accurate is the tensor? • Why is this important? Annila and Permi. (2004). Losonczi et al. (1999).
Motivation • Develop a mathematical framework to quantify the accuracy of the tensor derived from experimental data • Based on the uniformity of the distribution of bond vectors Fushman, D., Ghose, R., and Cowburn, D. (2000). The effect of finite sampling on the determination of orientational properties: A theoretical treatment with application to interatomic vectors in proteins. J. Am. Chem. Soc. 122: 10640-10649.
Outline • Background • Theory • Results • Applications • Alternate Approaches
Background • “Distribution of bond vectors” • Consider the orientation of NH bond vectors in an α-helix • Suppose the alignment tensor has the illustrated set of principle axes • Principal axis of alignment tensor would be well sampled, while axes orthogonal essentially undetermined • CαHα bond vectors? 1UBI, residues 23-34
Background • How well is the orientation space sampled by a distribution of interatomic vectors? • How well does the bond vector distribution sample the various components of the second-rank tensor of interest? • How well can the bond vector distribution completely characterize the tensor?
Background • Which sets of bond vectors sample the orientation space the best? (NH, CαHα, etc.) • Survey known structures from PDB to determine from lecture notes
Background • If an infinite number of vectors were available, all directions in the orientation space are sampled equally • The quality of the determined tensor would be independent of the orientation of its principal axes • In real NMR experiments: • The set of interatomic vectors is finite; incomplete sampling of orientation space • The orientational distribution of available vectors is not uniform
Outline • Background • Theory • Results • Applications • Alternate Approaches
Theory • We want to (a) characterize the distribution of bond vectors, and (b) characterize the accuracy of the tensor • We can derive: • sampling tensor (Ω): Represents the sampling of the bond vectors along three axes of an arbitrary reference frame • generalized sampling parameter (Ξ): Quantifies the degree of uniformity of the distribution of bond vectors • average constant (Dav): Quantifies how well the tensor of interest, D, is sampled • generalized quality factor (Λ): Describes how efficiently a set of bond vectors samples all elements of the tensor of interest, D
Review • The alignment tensor S represents the average substructure alignment in an aligning medium • S can be diagonalized: • V is a 3×3 rotation matrix defining the principal order frame (rotation from molecular frame) • Σ is a 3×3 diagonal, traceless matrix containing the principal values of S (Szz, Syy, Sxx)
Sampling Tensor (Ω) • Represents the sampling of bond vector orientations along three axes of an arbitrary reference frame • ri= projection of a unit vector on the axis i • i, j = x’, y’, z’ (an arbitrary reference frame) • Ω can be diagonalized to yield: • the principal axis frame (a rotation R(φ,θ,ψ) from the arbitrary reference frame) corresponding to the direction of best sampling • the principal values Ωi of the sampling tensor
Sampling Tensor (Ω) • Ωi = the principal values of the sampling tensor • fi = the fraction of vectors oriented along the three principal directions • ordering (direction of best sampling): Ωz ≥ Ωy ≥ Ωx, thus fz ≥ fy ≥ fx or • Optimal: If the distribution of vectors is uniform, fx = fy = fz = 1/3, and Ω is the null tensor • Worst case: If all vectors are oriented along the principal z-axis, then fx = fy = 0, fz = 1 and Ωx = Ωy = -1/2, Ωz = 1 • Deviations of fi from 1/3 and Ωi from 0 reflect non-uniformity
Example • Optimal: If the distribution of vectors is uniform, fx = fy = fz = 1/3, and Ω is the null tensor • Worst case: If all vectors are oriented along the principal z-axis, then fx = fy = 0, fz = 1 and Ωx = Ωy = -1/2, Ωz = 1 • Example: • Consider the NH vectors of an α-helix aligned along the helical axis a • a is approximately parallel to the principal z-axis of Ω 1UBI, residues 23-34
Geometric Representation • Sampling fractions for a set of bond vectors can be represented as a vector in {fx, fy, fz}-space • f = (fx, fy, fz) • Plane can be parameterized by {η, ζ}, rhombic and axial components {fx, fy, fz}-space
Generalized Sampling Parameter (Ξ) • Quantifies the degree of uniformity of the distribution of bond vector orientations, on a scale from 0 to 1 • Optimal: fx = fy = fz = 1/3 Ξ = 0 • Worst case: fx = fy = 0, fz = 1 Ξ = 1
Average Constant (Dav) • Quantifies how well the tensor of interest, D, is sampled • Ωij represent elements of the sampling tensor Ω, while Dij represent elements of D • If all parts of the tensor are sampled equally well, Dav = (1/3)Tr[D] = Diso
Average Constant (Dav) • Rewriting in the principal axis frame of the sampling tensor, we getwhich quantifies how well each principal component of the tensor is defined by the distribution of vectors • principle components = Di(= Dxd, Dyd, Dzd)for example, Sxx, Syy, Szz for thealignment tensor • Φi = {Φx, Φy, Φz} is a three component vector which measures how well each principle component Di is sampled
Average Constant (Dav) • Φi = {Φx, Φy, Φz} is a three component vector which measures how well each principle component Di is sampled • (li, mi, ni) are direction cosines which relate the ith principal axis of D (i = xd, yd, zd) to the principle axes of the sampling tensor (x, y, z) • Optimal: If there is a uniform distribution, Φx, Φy, Φz = 1/3 and all principle components of Di are uniformly sampled • Worst case: If all vectors are aligned parallel to some axis a • When the ith principal axis of D is parallel to a:Di is maximally sampled and Φi = 1 • When the ith principal axis of D is orthogonal to a:Diis minimally sampled and Φi = 0
Example If the tensor D has principal axes such that zd || aThen:Φz≈ 1Φy ≈ 0Φx ≈ 0(Φimeasures how well the principle component Diis sampled) • Worst case: If vectors are aligned parallel to someaxis a • If the ith principal axis of D is parallel to a, Di is maximally sampled and Φi = 1 • If the ith principal axis of D is orthogonal to a, Diis minimally sampled, Φi = 0 • Consider the NH vectors of an α-helix aligned along the helical axis a, and suppose the tensor of interest, D, has principle axes such that axis zd || a 1UBI, residues 23-34
Generalized Quality Factor • Describes how efficiently a set of bond vectors samples all elements of D, on a scale from 0 to 1 • Recall: Dav quantifies how well D is sampled, and Diso is the optimal value of Dav; Λ represents deviation from optimal value • Can also calculate Λmin, lower bound on tensor quality, based only on bond vector distribution • Optimal: Λ = 1, Worst case: Λ = 0
Geometric Representation • f = (fx, fy, fz) • Parametrize plane in terms of {η, ζ} • Rhombic component: • Axial component: • Intuition: degree of “directional asymmetry” of sampling tensor fz = fy = fx = 1/3 η = ζ = 0
Geometric Representation Ξ (generalized sampling parameter) Λ (generalized quality factor) “allowed triangle”, since fz > fy > fxboundsη, ζ
Review • sampling tensor (Ω): Represents the sampling of the bond vectors along three axes of an arbitrary reference frame • generalized sampling parameter (Ξ): Quantifies the degree of uniformity of the distribution of bond vectors • average constant (Dav): Quantifies how well the tensor of interest, D, is sampled • generalized quality factor (Λ): Describes how efficiently a set of bond vectors samples all elements of D;that is, how accurate is the tensor in general? Questions so far?
Theory • How are quality factor (Λ) and actual tensor accuracy of D related? • Take a correct tensor, introduce random errors principal values (εd) and orientation (εa) of D • Correlate size of error to decrease in quality factor (Λ) • When principal axes of Ω and D are related by “magic angle,” correlation with Λ breaks down (although Ξ still accurate) • Useful framework, but “contrived” and subject to errors • Other approaches?
Outline • Motivation • Theory • Results • Applications • Alternate Approaches
Results • Sampling properties of known structures • Survey using structures from PDB • 1736 structures (879 single proteins, 857 multi-subunit proteins) • Represents all experimentally determined protein folds • Structural basis for distributions • Each structure (i.e. each set of bond vectors) corresponds to a point in the “allowed triangle” parametrized by {η, ζ}
Results • Ξ: 0 optimal, 1 worst case • Λ: 1 optimal, 0 worst case
Ideal α-helix Ideal β-sheet Results Molecular Biology of the Cell: Fifth Edition
Results generalized sampling parameter (Ξ): 0 optimal, 1 worst case • Structural basis for distributions • Different generalized sampling parameters in different secondary structures
Results Sampling distribution:
Results • Observations: • NH vectors are the least uniformly distributed • C’O vectors also non-uniform, correlated to NH; almost antiparallel, in the same peptide plane as NH vectors • Reflects protein folding, secondary structure, e.g. N-H•••O=C hydrogen bonding in α-helices, β-sheets
Results generalized sampling parameter (Ξ): 0 optimal, 1 worst case • α-helix, 310 helix: NH vectors highly ordered, adding CαHα improves • β-sheet: NH and CαHα highly ordered, adding CαHα will not improve
Ideal α-helix Ideal β-sheet Results Molecular Biology of the Cell: Fifth Edition
Outline • Motivation • Theory • Results • Applications • Alternate Approaches
Applications • What does determining these values (sampling parameter, quality factor) allow us to do? • Characterize the accuracy of the tensor derived from experimental data • Therefore, we can optimize experimental design • Which vectors to use? Avoid limitations from vector set • Which aligning medium to use? Optimize sampling
Applications • Example 1: Determine rotational diffusion tensor from 15N relaxation data; NH vectors • βARK PH domain (PDB: 1BAK) • all residues:Ξ = 0.0232Λ = 0.9256f = (0.4060, 0.3583, 0.2357) • α-helical residues:Ξ = 0.7610f = (0.9148, 0.0473, 0.0379) • β-strand residues:Ξ = 0.1398f = (0.5569, 0.3171, 0.1261) • Note: NH vectors in β-strands orthogonal to helical NH vectors
Applications • Conclusion: • With NH vectors, α-helix alone insufficient to fully characterize tensor • Solution: • Use more vectors, or additional set(s) of vectors • CαHα or CαC’ vectors
Applications • Example 2: Determining Saupe alignment tensor from RDC measurements • Ubiquitin in liquid-crystalline aligning medium; NH vectors • Ξ = 0.1084Λ = 0.7724Λmin = 0.69 • The quality factor (Λ) changes if we change the alignment tensor frame (sampling tensor frame remains the same) Prestegard et al. (2004). The quality factor (Λ) depends on the orientation of the axes of the alignment tensor with respect to the sampling tensor frame
Applications • Conclusion: • Need to change the orientation of the alignment tensor • Solution: • Changing the orientation of the alignment will result in more optimal sampling, higher quality factor (Λ), and more accurate alignment tensor • Experimentally: dope aligning medium with ions (circles), use different orienting medium (triangles), etc. Prestegard et al. (2004). The quality factor (Λ) depends on the orientation of the axes of the alignment tensor with respect to the sampling tensor frame
Applications • Nuclear Vector Replacement (NVR) • Assignment depends on accuracy of alignment tensor • Backcalculate RDCs from bond vectors in structural model: D = DmaxvTSv • NH RDCs in two media (two alignment tensors) • Minimalistic approach • Saves spectrometer time (no 13C-labeling, triple resonance experiments) • What about sampling of tensors? What about distribution of NH bond vectors? Langmead and Donald. (2004). from lecture notes
Applications • My project: • Modify NVR: NH, CαHα RDCs in one medium • More experiments, more expensive experiments • But, allows testing on new systems • Also, increased assignment accuracy? • Grouping NH, CαHα bond vectors gives more uniform distribution • One alignment tensor instead of two; better sampled • Alignment tensor more accurate, resulting in more accurate assignments? • Better disambiguation of RDCs? distributionmore uniform, higher Λ
Outline • Motivation • Theory • Results • Applications • Alternate Approaches
Alternate Approaches • How accurate is a tensor? • This approach: concerned with distribution of bond vectors used to determine tensor • Other approach: compare estimated tensor to correct tensorYan, A.K., Langmead, C.J., & Donald, B.R. (2005). A probability-based similarity measure for Saupe alignment tensors with applications to residual dipolar couplings in NMR structural biology. The International Journal of Robotics Research. 24(2-3): 165-182. • Also suggested: assume uniform distribution of bond vectors, compare distribution of RDC values generated (RMSD)
Alternate Approaches • Compare estimated Saupe matrix to correct Saupe matrix • Upper bound on the probability that a randomly rotated tensor has error smaller than the estimated tensor • Compare eigenvalues (compare axial and rhombic components of the tensor) • Compare angular error between eigenvectors
Summary • How accurate is a second-rank tensor determined from bond vectors and other experimental data? • How well is orientation space sampled? • How well are components of tensor sampled? • How well is the tensor completely characterized? • Sampling properties of bond vector sets in real proteins have a biological, structural basis • Can use measurements to optimize experimental design
Thanks! Questions? References: Fushman, D., Ghose, R., & Cowburn, D. (2000). The effect of finite sampling on the determination of orientational properties: A theoretical treatment with application to interatomic vectors in proteins. J. Am. Chem. Soc. 122: 10640-10649. Annila, A., Permi, P. (2004). Weakly aligned biological macromolecules in dilute aqueous liquid crystals. Concepts in Magnetic Resonance. 23A(1): 22-37. Langmead, C.J., & Donald, B.R. (2004). An expectation/maximization based nuclear vector replacement algorithm for automated NMR resonance assignments. Journal of Biomolecular NMR. 29: 111-138. Losonczi, J.A., Andrec, M., Fischer, M.W.F., Prestegard, J.H. (1999). Order matrix analysis of residual dipolar couplings using singular value decomposition. Journal of Magnetic Resonance. 138: 334-342. Yan, A.K., Langmead, C.J., & Donald, B.R. (2005). A probability-based similarity measure for Saupe alignment tensors with applications to residual dipolar couplings in NMR structural biology. The International Journal of Robotics Research. 24(2-3): 165-182. Lecture notes from Computer Science 260 at Duke University, Spring 2008. http://www.cs.duke.edu/brd/Teaching/Bio/asmb/current/