Bayesian Sets

Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17th, 2006

Outline • Introduction • Bayesian Sets • Implementation • Binary data • Exponential families • Experimental results • Conclusions

Introduction • Inspired by “GoogleTM Sets” • What do Jesus and Darwin have in common? • Two different views on the origin of man • There are colleges at Cambridge University named after them • The objective is to retrieve items from a concept of cluster, given a query consisting of a few items from that cluster

Introduction • Consider a universe of items , which can be a set of web pages, movies, people or any other subjects depending on the application • Make a query of small subset of items , which are assumed be examples of some cluster in the data. • The algorithm provides a completion to the query set, . It presumably includes all the elements in and other elements in that are also in this cluster.

Introduction • View the problem from two perspectives: • Clustering on demand • Unlike other completely unsupervised clustering algorithm, here the query provides supervised hints or constraints as to the membership of a particular cluster. • Information retrieval • Retrieve the information that are relevant to the query and rank the output by relevance to the query

Bayesian Sets • Very simple algorithm • Given and , we aim to rank the elements of by how well they would “fit into” a set which includes • Define a score for each : • From Bayes rule, the score can be re-written as:

Bayesian Sets • Intuitively, the score compares the probability that x and were generated by the same model with the sameunknown parameters θ, to the probability that x and came from models with different parameters θ and θ’.

Bayesian Sets

Sparse Binary Data • Assume each item is a binary vector where each component is a binary variable from an independent Bernoulli distribution: • The conjugate prior for a Bernoulli distribution is a Beta distribution: • For a query where

Sparse Binary Data • The score can be computed as: • If we take a log of the score and put the entire data set into one large matrix X with J columns, we can compute a vector s of log scores for all points using a single matrix vector multiplication: where and

Exponential Families • If the distribution for the model is not a Bernoulli distribution, but in the form of exponential families: we can use the conjugate prior: so that the score is:

Experimental results • The experiments are performed on three different datasets: the Grolier Encyclopedia dataset, the EachMovie dataset and NIPS authors dataset. • The running times of the algorithm is very fast on all three datasets:

Experimental results

Conclusions • A simple algorithm which takes a query of a small set of items and returns additional items from belonging to this set. • The score is computed w.r.t a statistical model and unknown model parameters are all marginalized out. • With conjugate priors, the score can be computed exactly and efficiently. • The methods does well when compared to Google Sets in terms of set completions. • The algorithm is very flexible in that it can be combined with a wide variety of types of data and probabilistic model.

Bayesian Sets

Bayesian Sets

Presentation Transcript

Sets

Sets

Bayesian Essentials and Bayesian Regression

Sets

Sets

Bayesian

Sets

Sets

Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

Sets

Sets

Sets

SETS

Sets

Sets

Sets

Sets

Sets

Sets

Sets

Sets

Sets