Multi-Kernel Multi-Label Learning with Max-Margin Concept Network

Multi-Kernel Multi-Label Learning with Max-Margin Concept Network 1Wei Zhang, 1Xiangyang Xue, 2Jianping Fan , 1Xiaojing Huang, 1Bin Wu, 1Mingjie Liu 1Fudan University, China; 2UNCC, USA {weizh, xyxue}@fudan.edu.cn IJCAI-2011

Content • Motivation • Overview • Concept Network Construction • The Proposed Model • Multi-Kernel Multi-Label Learning • Experiments • Conclusions

Motivation • Semantics richness requires multiple labels for sufficient data semantic description, so multi-label is necessary. • When multiple labels are available for a single sample, there can be strong inter-label correlations. • Similarity diversity cannot be characterized effectively by one single kernel, so multi-kernel is necessary.

Overview Inter-label dependency and similarity diversity are leveraged simultaneously in the proposed method. • A concept network is constructed to capture inter-label correlations for classifier training. • Maximal margin approachis used to effectively formulate the feature-label associations and the label-label correlations. • Specific kernels are learned not only for each label but also for each pair of the inter-related labels.

Concept Network Construction • A concept network is constructed to characterize the inter-label correlations and to learn the inter-related classifiers. • Each concept corresponds to one certain node in concept network. • If two concepts are inter-related, there is an edge between the corresponding two nodes. Empirical conditional probabilities: If then

The Proposed Model • Our model captures the feature-concept associations and the inter-concept correlations in a unified framework: • are functions mapping sample features x to kernel spaces with respective to the node and the edge, respectively. • ,

Max-Margin Method for Model Learning • By considering both site and edge potentials in a unified framework, we sufficiently leverage the associations between features and labels, and the correlations among labels and their dependence on the features. • To learn the proposed model, the objective function is defined as: • where • and constraints !

Learning Interdependent Classifiers • we factor the proposed global model formulation as the sum of local models: where • our optimization can be approximately decoupled into c interdependent sub-problems:

Similarity Diversity by Multi-kernel • The dual of the optimization problem is as follows: • where We would employ multi-kernel technique to implement both the concept specific and the pairwise concept specific feature mappings such that similarity diversity can be effectively characterized.

Multi-kernel Learning • We first define an original kernel regardless of label information using Gaussian kernel, and decompose the Gram kernel matrix by spectral decomposition: • To incorporate the label information, we learn the concept-specific kernel matrix for each label by maximizing the similarities between data with the same label: • To sufficiently leverage the correlations among the concepts and their dependence on the input features, the pairwise label specific kernel matrix can be learned by: • Both the concept-specific kernel matrix and the pairwise label specific kernel matrix share the common basis as the original kernel K:

Model Inference • For any new image, the inference problem is to find the optimal label configuration • The size of multi-label space is exponential to the number of classes, and it is intractable to enumerate all possible label configurations to find the best one. We employ an approximate inference technique (ICM): • Initialize a multi-label configuration • In each iteration, given , we sequentially update using the local model: If > then ; otherwise

Experiments We compare our method in web page classification with the state-of-the-art methods: • RML [Petterson and Caetano,2010]; ML-KNN [Zhang and Zhou, 2007]; • Tang’s method [Tang et al., 2009]; and RankSVM[Elisseeff and Weston, 2002].

Experiments • We consider other real applications in experiments: image annotation, music emotion tagging, and gene categorization.

Conclusions: Inter-label dependency and similarity diversity are simultaneously leveraged in multi-kernel multi-label learning. • A concept network is constructed for characterizing the inter-label correlations effectively. • Maximal margin technique effectively captures the feature-label associations and the label-label correlations. • By decoupling the multi-label learning task into inter-dependant sub-problems label by label, the proposed method learns multiple interrelated classifiers jointly. • Specific kernels not only for each label but also for each pair of inter-related labels are learned to embed the label information and the inter-label correlations.

Thanks a lot! Q & A ?

Multi-Kernel Multi-Label Learning with Max-Margin Concept Network

Multi-Kernel Multi-Label Learning with Max-Margin Concept Network

Presentation Transcript

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier

Multi-protocol Label Switching

Multi Protocol Label Switching

Mulan : A Java Library for Multi-Label Learning

Scalable Multi-Label Annotation

Large Scale Multi-Label Classification

Multi-Label Collective Classification

A k-Nearest Neighbor Based Multi-Instance Multi-Label Learning Algorithm

Effective Multi-Label Active Learning for Text Classification

Max-margin sequential learning methods

Multi-Protocol Label Switching MPLS

HCP model: Single-label to Multi-label

Multi-Protocol Label Switch (MPLS)

Multi-stage Network

Correlative Multi-Label Multi-Instance Image Annotation

Multi-Protocol Label Switching (MPLS)

Multi Protocol Label Switching (MPLS)

MPLS: Multi-protocol Label Switching

Multi-Label Collective Classification

Multi-Protocol Label Switching (MPLS)