Consensus Maximization with Graph-Based Models for Enhanced Predictions

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao1, Feng Liang1, Wei Fan2, Yizhou Sun1, Jiawei Han1 1 University of Illinois, Urbana-Champaign 2 IBM TJ Watson Consensus Maximization Methodology Group-Object Related Work Optimization over a Bipartite Graph [1 0 0] [0 1 0] [0 0 1] Objects Groups Output—conditional prob vector x1 g1 • Goal • Combine output of multiple supervised and unsupervised models on a set of objects • The predicted labels should agree with the base models as much as possible • Motivations • Unsupervised models provide useful constraints for classification tasks • Model diversity improves prediction accuracy and robustness • Model combination at output level is needed due to privacy-preserving or incompatible formats • Applications • Image categorization: images, descriptions, notes, comments, albums, tags,…… • Movie recommendation: movie genres, cast, director, plots, users viewing history, movie ratings,…… • Research area prediction: publication and co-authorship network, published papers,…… • Many more…… object i M1 g2 x2 group j • Summary of learning algorithms: • Y-axis: goal of learning X-axis: methodologies • The proposed method can be regarded as a semi-supervised ensemble approach working at the output level g3 x3 Input—affinity matrix of the graph M2 …… x4 g7 Input—initial probability of groups x5 M3 Interpretations g8 x6 Constrained Embedding g9 Goal: embed both group and object nodes into a c-dimensional unit cube each group node is close to the constraint node from supervised models x7 M4 …… Objective function Minimize disagreement similar conditional probability if the object is connected to the group do not deviate much from the initial probability Methodology each group node is close to the object nodes it contains Update probability of a group Update probability of an object Data Sets 20 Newsgroup: newsgroup messages categorization Cora: paper area prediction DBLP: researchers’ area prediction Baseline Methods Single models: two classification models and two clustering models Proposed methods: BGCM, BGCM-L (semi-supervised version), 2-L (two models),3-L (three models) Ranking on Consensus Structure rank all the groups according to their relevance to the queries groups from supervised models act as queries in semi-supervised version groups of supervised models groups of unsupervised models Iterate until convergence Take away messages The proposed consensus maximization method combines the complementary predictive powers of multiple supervised and unsupervised models to reach a better solution. Experimental Results Sensitivity Analysis Accuracy Codes and datasets available at http://ews.uiuc.edu/~jinggao3/nips09bgcm.htm

Consensus Maximization with Graph-Based Models for Enhanced Predictions

Consensus Maximization with Graph-Based Models for Enhanced Predictions

Presentation Transcript

Algorithms for Distributed Supervised and Unsupervised Learning

Supervised and unsupervised wrapper generation

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text

Supervised learning vs. unsupervised learning

Supervised and Unsupervised learning for Natural language processing

Unsupervised Rank Aggregation with Distance -Based Models

Unsupervised models and clustering

Scalable Methods for Graph-Based Unsupervised and Semi-Supervised Learning

Mixture Models And Expectation Maximization

Lab 5 Unsupervised and supervised clustering

Graph Models

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

LING 696B: Graph-based methods and Supervised learning

Classification Supervised and unsupervised

Unsupervised and Supervised Tracking

Scalable Methods for Graph-Based Unsupervised and Semi-Supervised Learning

Version control for graph-based models

Unsupervised, Cont’d Expectation Maximization

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models