Dependency Networks for Collaborative Filtering and Data Visualization

Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표: 황규백

Abstract • Dependency networks • An alternative for the Bayesian network • A (cyclic) directed graph • Basic properties of dependency networks • Dependency networks for collaborative filtering • Dependency networks for data visualization

Introduction • A dependency network • A collection of regression/classification models among variables combined using Gibbs sampling • Disadvantages • Not useful for encoding a causal relationships • Advantages • Quite useful for encoding a predictive relationships

Representation of Joint Distribution • In Bayesian networks • In dependency networks • Via ordered Gibbs sampler • Initialize each variable randomly. • Resample each Xi according to • Theorem 1: • An ordered Gibbs sampler applied to a dependency network for X, where each Xi is discrete and each local distribution p(xi|pai) is positive, has a unique stationary distribution for X.

Conditional Distribution • Gibbs sampling is used. • Not so disadvantageous • Learning • Not representing the causal relationships • Each local distribution can be learned without regard to acyclicity constraints. • Consistency and inconsistency • Inconsistent dependency networks • All conditional distributions are not obtainable from a single joint distribution p(x). • Theorem 2: • If a dependency network for X is consistent with a positive distribution p(x), then the stationary distribution defined in Theorem 1 is equal to p(x).

Other Properties of Dependency Networks • Markov networks and dependency networks • Theorem 3: • The set of positive distributions consistent with a dependency network structure is equal to the set of positive distributions defined by a Markov network structure with the same adjacencies. • Defining the same distributions, however, representational forms are different. • Potentials vs. Conditional probabilities • Minimality of the dependency network • For every node Xi, and for every parent paij, Xi is not independent of paij given the remaining parents of Xi. • Theorem 4: • A minimal consistent dependency network for a positive distribution p(x) must be bi-directional.

Learning Dependency Networks • Each local distribution for Xi is simply a regression/classification model for xi with X \ {xi} as inputs. • Generalized linear models, neural networks, support-vector machines, … • In this paper, the decision tree was used. • A simple hill-climbing approach with a Bayesian score

Collaborative Filtering • Preferences prediction • Implicit/explicit voting • Binary/non-binary preferences • Bayesian network approach • In a dependency network

Datasets for Collaborative Filtering • MS.COM(Webpages), Nielsen(TV show), MSNBC(Stories in the site)

Evaluation Criteria and Experimental Procedure • Accuracy of the list given by a predictive model • Average accuracy of a model • A case in the test set • <input set | measurement set> (randomly partitioned) • <0, 1, 1, 0, 1, 0, | 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1>

Results on Accuracy • Higher score indicates better performance.

Results on the Prediction Time • Number of predictions per second

Results on Computational Resources • Computational resources for model learning

Data Visualization • Predictive relationships (not causal) • Bayesian networks often interfere with the visualization of such relationships. • Dependent or independent • Example • DNViewer • Media Metrix data

DNViewer • A dependency network for Media Metrix data

DNViewer for Local Distribution • Local probability distribution

Summary and Future Work • The dependency network • defines a joint distribution for variables. • is easy to learn from data. • is useful for collaborative filtering and data visualization. • is for conditionals. • The Bayesian network • is for joint probability distribution.

Dependency Networks for Collaborative Filtering and Data Visualization