210 likes | 229 Views
Pairwise Constraint Propagation by Semidefinite Programming for Semi-Supervised Classification. Zhenguo Li (Joint work with Jianzhuang Liu and Xiaoou Tang) Department of Information Engineering The Chinese University of Hong Kong. Outline. Semi-Supervised Classification Our Work
E N D
Pairwise Constraint Propagation by Semidefinite Programming for Semi-Supervised Classification Zhenguo Li (Joint work with Jianzhuang Liu and Xiaoou Tang) Department of Information Engineering The Chinese University of Hong Kong
Outline • Semi-Supervised Classification • Our Work • Experimental Results • Conclusions and Future Work
Traditional Semi-Supervised Classification • Learning from labeled and unlabeled data. • Assumption • Nearby objects tend to be in the same class (cluster assumption). • Idea • The known class labels are propagated smoothly to unlabeled data (label propagation).
Challenges • The distributions of real-world data are often more complex than expected where • a class may consist of multiple separate groups. • different classes may be close or overlapped. • Pairwise constraints are natural in these circumstances, which specify whether two objects are in the same class or not (must-link and cannot-link). • Techniques for label propagation are not readily extended to handle pairwise constraints.
Our Work • We consider the general problem of classifying from pairwise constraints and unlabeled data. • It is more general than traditional semi-supervised classification. • In contrast to label propagation, we attempt to explore an approach for pairwise constraint propagation.
The Global Viewpoint • The must-link constraint asks to merge the outer and inner circles into one class; • The cannot-link constraint asks to keep the middle and outer circles into different classes.
Our Assumptions • Cluster Assumption • Nearby objects shouldbe in the same class. • Pairwise Constraint Assumption • Objects similar to two must-link objects respectively should be in the same class; • Objects similar to two cannot-link objects respectively should be in different classes. • Our goal is to implement both the two assumptions in a unified framework.
Our Idea • Learn a nonlinear mapping to reshape the data such that • Nearby objects are mapped nearby; • Two must-link objects are mapped close and two cannot-link objects are mapped far apart; • Objects similar to two must-link objects respectively are mapped close, and objects similar to two cannot-link objects respectively are mapped far apart. • In doing so, the pairwise constraints will be propagated to the entire data set.
Interpretation • Constraint Satisfaction The inequalities require two must-link objects to be mapped close and two cannot-link objects to be mapped far apart. • Constraint Propagation By enforcing the smoothness on the mapping, two objects similar to two must-link objects respectively are mapped close and two objects similar to two cannot-link objects respectively are mapped far apart. • After the mapping, hopefully each class becomes compact and different classes become far apart.
The Unit Hypersphere Model • All the objects are mapped onto the unit hypersphere. • Two must-link objects are mapped to the same point. • Two cannot-link objects to be orthogonal. • Smoothness measure
Learning a Kernel Matrix • Let • The matrix can be thought as a kernel over the data set, where is just the feature map induced by . • (Kernel Trick) We can implicitly obtain the feature map by explicitly pursuing the corresponding kernel matrix.
Learning a Kernel Matrix • The constraints become • The smoothness measure becomes
Kernel K-means • Finally, we apply the kernel K-means to the learned kernel matrix to obtain k classes of the objects.
Experimental Results: Toy Data • Distance matrices before and after the mapping
Conclusions • We have proposed a framework PCP for learning from pairwise constraints and unlabeled data: • It can effectively propagate pairwise constraints; • It is formulated as a SDP problem. • Future work includes • accelerating PCP; • handling noisy constraints effectively; • applying PCP to practical applications.