590 likes | 776 Views
DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering Maastricht University. PART 2 Exploratory Data Analysis. VISUALISING AND EXPLORING DATA-SPACE. Data Mining Lecture II [Chapter 3 from Principles of Data Mining by Hand,, Manilla, Smyth ].
E N D
DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering Maastricht University
PART 2 Exploratory Data Analysis
VISUALISING AND EXPLORING DATA-SPACE Data Mining Lecture II [Chapter 3 from Principles of Data Mining by Hand,, Manilla, Smyth ]
Observe that the spatial extent appears different in each dimension. Also observe that in this case the set is almost 1-dimensional. Can we project the set so that the spatial extent in one dimension is optimal?
a Data X: n rows of p fields: the vectors are rows in X. STEP 1: Subtract the average value from the dataset X: mean centered data. The spatial extent of this cloud of points can be measured by the variance in the dataset X. This is an entry in the correlation matrix V = XTX. The projection of the dataset X in a direction a is: y = Xa. The spatial extent in direction a isthe variance in the projected dataset Y: i.e. the variance σa2 =yTy = (Xa)T(Xa) = aTXTXa = aTV a . We now want to maximize this extent σa2 over all possible vectors a (why?).
STEP 2: Maximize: σa2 =aTV aover all possible vectors a. This is unlimited, just like maximizing x2over x, therefore we restrict the size of vector a to 1: aTV a – 1 = 0 So we have: maximize: aTV asubject to:aTV a – 1 = 0 This can be solved with the Lagrange-multipliers method: maximize: f(x)subject to:g(x) = 0 → d/dx{ f(x) – λg(x)} = 0 For our case this means: d/da{ aTV a – λ(aTV a – 1 )} = 0 →2 Va – 2λa = 0 →Va = λa This means that we are looking for the eigen-vectors and eigen-values of the correlation matrix V = XTX.
Principal axis 1 Principal axis 2 MEAN
astronomical application: PCs for elliptical galaxies Rotating to PC in BT – Σ space improves Faber-Jackson relation as a distance indicator Dressler, et al. 1987
astronomical application: Eigenspectra (KL transform) Connolly, et al. 1995
1 pc 2 pc 4 pc 3 pc