10 likes | 22 Views
This paper introduces a Graph-Based Consensus Maximization method that combines outputs of supervised and unsupervised models to improve prediction accuracy. By leveraging diverse model approaches, this methodology aims to enhance robustness and agreement with base models. The research includes applications in image categorization, movie recommendation, and research area prediction. The proposed approach optimizes over a bipartite graph to achieve consensus among model outputs, ensuring closer alignment with initial probabilities. Experimental results demonstrate improved sensitivity and accuracy through the consensus maximization process. More details and datasets are available at the provided link.
E N D
Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao1, Feng Liang1, Wei Fan2, Yizhou Sun1, Jiawei Han1 1 University of Illinois, Urbana-Champaign 2 IBM TJ Watson Consensus Maximization Methodology Group-Object Related Work Optimization over a Bipartite Graph [1 0 0] [0 1 0] [0 0 1] Objects Groups Output—conditional prob vector x1 g1 • Goal • Combine output of multiple supervised and unsupervised models on a set of objects • The predicted labels should agree with the base models as much as possible • Motivations • Unsupervised models provide useful constraints for classification tasks • Model diversity improves prediction accuracy and robustness • Model combination at output level is needed due to privacy-preserving or incompatible formats • Applications • Image categorization: images, descriptions, notes, comments, albums, tags,…… • Movie recommendation: movie genres, cast, director, plots, users viewing history, movie ratings,…… • Research area prediction: publication and co-authorship network, published papers,…… • Many more…… object i M1 g2 x2 group j • Summary of learning algorithms: • Y-axis: goal of learning X-axis: methodologies • The proposed method can be regarded as a semi-supervised ensemble approach working at the output level g3 x3 Input—affinity matrix of the graph M2 …… x4 g7 Input—initial probability of groups x5 M3 Interpretations g8 x6 Constrained Embedding g9 Goal: embed both group and object nodes into a c-dimensional unit cube each group node is close to the constraint node from supervised models x7 M4 …… Objective function Minimize disagreement similar conditional probability if the object is connected to the group do not deviate much from the initial probability Methodology each group node is close to the object nodes it contains Update probability of a group Update probability of an object Data Sets 20 Newsgroup: newsgroup messages categorization Cora: paper area prediction DBLP: researchers’ area prediction Baseline Methods Single models: two classification models and two clustering models Proposed methods: BGCM, BGCM-L (semi-supervised version), 2-L (two models),3-L (three models) Ranking on Consensus Structure rank all the groups according to their relevance to the queries groups from supervised models act as queries in semi-supervised version groups of supervised models groups of unsupervised models Iterate until convergence Take away messages The proposed consensus maximization method combines the complementary predictive powers of multiple supervised and unsupervised models to reach a better solution. Experimental Results Sensitivity Analysis Accuracy Codes and datasets available at http://ews.uiuc.edu/~jinggao3/nips09bgcm.htm