330 likes | 390 Views
Introduction. Given a Matrix of distances D, (which contains zeros in the main diagonal and is squared and symmetric), find variables which could be able, approximately, to generate, these distances.
E N D
Introduction • Given a Matrix of distances D, (which contains zeros in the main diagonal and is squared and symmetric), find variables which could be able, approximately, to generate, these distances. • The matrix can also be a similarities matrix, squared and symmetric but with ones in the main diagonal and values between zero and one elsewhere. • Broadly: Distance (0 d 1) =1- similarity
Principal Coordinates (Metric Multidimensional Scaling) • Given the D matrix of distances, Can we find a set of variables able to generate it ? • Can we find a data matrix X able to generate D?
Main idea of the procedure: (1) To understand how to obtain D when X is known and given, (2) Then work backwards to build the matrix X given D
Procedure Remember that given a data matrix we have a zero mean data matrix by the transformation: With this matrix we can compute two squared and symmetric matrices The first is the covariance matrix S The second is the Q matrix of scalar products among observations
The matrix of products Q is closely related to the distance matrix , D, we are interested in. The relation between D and Q is as follows: Elements of Q: Elements of D: Main result: Given the matrix Q we can obtain the matrix D
How to recover Q given D? Note that as we have zero mean variables the sum of any row in Q must be zero t =trace(Q)
2. Obtain X given Q Note that: We cannot find exactly X because there will be many solutions to this problem. IF Q=XX’ also Q=X A A-1 X’ for any orthogonal matrix A. Thus B=XA is also a solution The standard solution: Make the spectral decomposition of the matrix Q Q=ABA’ Where A and B contain the non zero eigenvectors and eigenvalues of the matrix and take as solution X=AB1/2
Conclusion • We say that D is compatible with an euclidean metric if Q obtained as Q=-(1/2)PDP is nonnegative (all eigenvalues non negative)
(Note that they add up to zero by rows and columns. The matrix has been divided by 10000)
Example 1 Eigenstructure of Q :
Relationship with PC • PC: eigenvalues and vectors of S • PCoordinates: eigenvalues and vectors of Q If the data are matric both are identical. P Coordinates generalizes PC for non exactly metric data
Biplots Representar conjuntamente los observaciones por las filas de V2 y Las variables mediante las coordenadas D2/2 A’2 Se denimina biplots porque se hace una aproximación de dos dimensiones a la matriz de datos
A common method • Idea: if we have a monotone relation between x and y it must be a linear exact relationship between the ranks of both variables • Ordered regression or assign ranks and make a regression between ranks iterating