90 likes | 419 Views
Procrustes analysis. Purpose of procrustes analysis Algorithm R code Various modifications. Purpose of procrustes analysis.
E N D
Procrustes analysis • Purpose of procrustes analysis • Algorithm • R code • Various modifications
Purpose of procrustes analysis In general, slightly different dimension reduction techniques produce different configurations. For example when metric and non-metric scaling are used then different configurations may be generated. Moreover metric scaling with different proximity matrices can produce different configurations. Most of the techniques produce configuration that rotationally undefined. Since these techniques are used for the same multivariate observations each observation in one configuration corresponds to exactly one observation in another one. In these cases it is interesting to compare the results with each other and if the original data matrix is available then to compare with them also. There are other situations when comparison of configurations is needed. For example in macromolecular biology 3-dimensional structures of different proteins are derived. One of the interesting question is if two proteins are similar. If they are what is the similarity between them. That is the problem of configuration matching. All these questions can be addressed using procrustes analysis. Procrustes analysis finds the best match between two configurations, necessary rotation matrix and translation vector for the match and distances between configurations.
Procrustes analysis: problem Suppose we have two configurations (data matrices) X=(x1,x2,,,xn) and Y = (y1,y2,,,yn). where x-s and y-s are vectors in p dimensional space. We want to find an orthogonal matrix A and a vector b so that: It can be shown that finding translation (b) and rotation matrix (A) can be considered separately. Translation can easily be found if we centre each configuration. If the rotation is already known then we can find translation. Let us denote zi=Ayi+b. Then we can write: Since only the third term depend on the translation vector this function is minimised when the third term is equal 0. Thus: The first step in procrustes analysis is to centre matrices X and Y and remember the mean values of the columns of X and Y.
Prcucrustes analysis: matrix Once we have subtracted from each column their corresponding mean, the remaining problems is to find the orthogonal matrix (matrix of rotation or inverse). We can write: Here we used the fact that under trace operator circular permutation of matrices is valid and A is an orthogonal matrix: Since in the expression of M2 only the last term is dependent on A, the problem reduces to constrained maximisation: It can be done using Lagrange’s multipliers technique.
Rotation matrix using SVD Let us define a symmetric matrix of constraints by 1/2. Then we want to maximise: If we get the first derivatives of this expression wrt to matrix A and equate them to 0 then we get: (1) Here we used the following facts: and the fact that the matrix of the constraints is symmetric. Thus we have necessary linear equations to find the required orthogonal matrix. To solve the equation (1) let us use SVD of YTX: V and U are pxp orthogonal matrices. D is the diagonal matrix of the singular values.
Rotation matrix and SVD If we use the fact that A is orthogonal then we can write: and It gives the solution for the rotation (orthogonal) matrix. Now we can calculate least-squares distance between two configurations: Thus we have the expressions for rotation matrix and distances between configurations after matching. It is interesting to note that to find the distance between configurations it is not necessary to rotate one of them. One more useful expression is: This expression shows that it is even not necessary to do SVD to find distance between configurations.
Algorithm Problem: Given two configuration matrices X and Y with the same dimensions, find rotation and translation that would bring Y as close as possible to X. • Find the centroids (mean values of columns) of X and Y. Call them xmean and ymean. • Remove from each column corresponding mean. Call the new matrices Xn and Yn • Find (Yn)T(Xn). Find the SVD of this matrix (Yn)T(Xn) = UDVT • Find the rotation matrix A = UVT. That is the rotation matrix • Find the translation vector b = xmean - A ymean. It is the required translation vector • Find the distance between configurations: d2=tr((Xn)T(Xn))+tr((Yn)T(Yn))-2tr(D). That is the square of the required distance
R code procrustes = function(X,Y){ # Simple procrustes analysis. rmmean and tr are other functions needed # x1 = rmmean(X) y1 = rmmean(Y) Xn = x1$matr Yn = y1$matr xmean = x1$mean ymean = y1$mean rm(x1);rm(y1) s = svd(crossprod(Yn,Xn)) A = s$u%*%t(s$v) d=sqrt(tr(crossprod(Xn,Xn)+crossprod(Yn,Yn))-2*sum(s$d) b = xmean-A*ymean list(matr=A,trans=b,dist=d) }
Some modifications There are some situations where straightforward use of procrustes analysis may not be appropriate: • Dimensions of configurations can be different. There are two ways of handling this problem. The first way is to fill low dimensional (k) space with 0-s and make it high (p) dimensional. This way we assume that first k dimensions coincide. Here we assume that k-dimensional configuration is in the k-dimensional subspace of p-dimensional space. Second way is to collapse high dimensional configuration to low dimensional space. For this we need to project p-dimensional configuration to k-dimensional space. • Second problem is when the scales of the configurations are different. In this case we can add scale factor to the function we minimise: If we find orthogonal matrix as before then we can find expression for the scale factor: As a result distance between configuration M is no longer symmetric wrt X and Y. 3) Sometimes it is necessary to weight some variables down and others up. In these cases procrustes analysis can be performed using weights. We want to minimise the function: This modification can be taken into account if we find SVD of YTWX instead of YTX