90 likes | 192 Views
Proceedings of the 2007 SIAM International Conference on Data Mining. Abstract. The paper studies semi-supervised dimensionality reduction. Besides unlabeled samples, must-link and cannot-link constraints are incorporated as domain knowledge.
E N D
Proceedings of the 2007 SIAM International Conference on Data Mining
Abstract • The paper studies semi-supervised dimensionality reduction. • Besides unlabeled samples, must-link and cannot-link constraints are incorporated as domain knowledge. • SSDR algorithm: preserves structure of data as well as constraints in the projected low-dimension space.
Introduction • There exist supervised and unsupervised dimensionality reduction methods • FLD (Fisher Linear Discriminant): extracts discriminant vectors when class labels are available • cFLD (Constrained FLD): dimensionality reduction from equivalence constraints • PCA (Principal Component Analysis): preserves the global covariance structure of data when class labels are not available
Introduction (cont) • SSDR: • Must-link constraints: pairs of instances belonging to the same class • Cannot-link constraints: pairs of instances belonging to different classes • Structure of data • SSDR: simultaneously preserves the structure of data and pairwise constraints specified by users
SSDR Algorithm Find project vector W: Maximizing objective function: Subject to: wTw = 1 ???
SSDR Algorithm (cont) Extended objective function: Final form of extended objective function: (2.5) is a typical eigen-problem, which can be solved by computing the eigenvectors of XLXT corresponding to the largest eigenvalues.
Experiments • Data sets: 6 UCI data sets, YaleB facial image data set, 20-Newsgroup. • Results are averaged over 100 runs with different generation of constraints. • Parameters: α = 1, β = 20.