230 likes | 405 Views
Relational Learning with Gaussian Processes. By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented by Nesreen Ahmed, Nguyen Cao, Sebastian Moreno, Philip Schatz. Outline. Introduction Relational Gaussian Processes Application
E N D
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented by Nesreen Ahmed, Nguyen Cao, Sebastian Moreno, Philip Schatz
Outline • Introduction • Relational Gaussian Processes • Application • Linkage prediction • Semi-Supervised Learning • Experiments & Results • Conclusion & Discussion CS590M: Statistical Machine Learning - Fall 2008
Introduction • Many domains involve Relational Data • Web: document links • Document Categorization: citations • Computational Biology: protein interactions • Inter-relationships between instances can be informative for learning tasks • Relations reflect network structure, enrich how instances are correlated CS590M: Statistical Machine Learning - Fall 2008
Introduction • Relational Information represented by a graph G = (V, E) • Supervised Learning: • Provide structural knowledge • Also for semi-supervised: derived from input attributes. • Graph estimates the global geometric structure of the data CS590M: Statistical Machine Learning - Fall 2008
Gaussian Processes where 5 A Gaussian Process is a joint Gaussian distribution over sets of function values {fx} of any arbitrary set of n instances x CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Relational Gaussian Processes εij xi xj 6 Linkages: The uncertainty in observing εij induces Gaussian noise N(0, σ2) in observing the values of the corresponding instances’ function value CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Relational Gaussian Processes i,j runs over the set of observed undirected linkages EP algorithm approximates as : is a 2x2 symmetric matrix where 7 Approximate Inference: CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Relational Gaussian Processes where is a nxn matrix with four non-zero entries augmented from 8 CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Relational Gaussian Processes where elements of covariance matrix are given by evaluating the following (covariance) kernel function: 9 • For any finite collection of data points X, the set of random variables {fx} conditioned on ε have a multivariate Gaussian distribution: CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Linkage Prediction 10 • Joint prob. • Probability for an edge between Xr and Xs CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Semi supervised learning -1 ? ? ? ? -1 ? 1 ? ? ? ? 1 11 CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Semi supervised learning -1 ? ? ? ? -1 ? 1 ? ? ? ? 1 Nearest Neighborhood K=1 12 CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Semi supervised learning -1 ? ? ? ? -1 ? 1 ? ? ? ? 1 Nearest Neighborhood K=2 13 CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Semi supervised learning 14 Apply RGP to obtain Variables are related through a Probit noise Applying Bayes CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Semi supervised learning 15 Predictive distribution Obtaining Bernoulli distribution for classification CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Experiments 16 • Experimental Setup • Kernel function • Centralized Kernel : linear or Gaussian kernel shifted to the empirical mean • Noise level • Label noise = 10-4 (for RGP and GPC) • Edge noise = [5 : 0.05] CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Results Best value =0.4 based on approximate model evidence 30 Samples collected from a gaussian mixture with two components on the x-axis. Two labeled samples indicated by diamond and circle. K=3 17 CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Results Using the posterior covariance matrix learnt from the data as the new prior, supervised learning is carried out Curves represent predictive distribution for each class Posterior Covariance matrix of RGP learnt from the data It captures the density information of unlabelled data 18 CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Results 19 • Real World Experiment • Subset of the WEBKB dataset • Collected from CS dept. of 4 universities • Contains pages with hyperlinks interconnecting them • Pages classified into 7 categories (e.g student, course, other) • Documents are preprocessed as vectors of input attributes • Hyperlinks translated into undirected positive linkages • 2 pages are likely to be positively correlated if hyperlinked by the same hub page • No negative linkages • Compared with GPC & LapSVM (Sindhwani et al. 2005) CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Results 20 • Two classification tasks • Student vs. non-student, Other vs. non-other • Randomly selected 10% samples as labeled data • Selection repeated 100 times • Linear kernel • Table shows average AUC for predicting the labels of unlabeled cases CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Conclusion 21 • A novel Bayesian framework to learn from relational data based on GP • The RGP provides a data-dependent covariance function for supervised learning tasks (classification) • Applied to semi-supervised learning tasks • RGP requires very few labels to generalize on unseen test points • Incorporate unlabeled data in the model selection CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Discussion 22 • The proposed framework can be extended to model: • Directed (asymmetric) relations as well as undirected relations • Multiple classes of relations • Graphs with weighted edges • The model should be compared to other models • The results can be sensitive to choice of K in KNN CS590M: Statistical Machine Learning - Fall 2008 12/02/08
Thanks Questions ? 23 CS590M: Statistical Machine Learning - Fall 2008 12/02/08