The Nested Dirichlet Process

The Nested Dirichlet Process Paper by Abel Rodriguez, David B. Dunson, and Alan E. Gelfand, Submitted to JASA 2006 Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006

Outline • Introduction • Nested Dirichlet process • Application on haplotype inference

Motivation • General problem – Extending the Dirichlet Process to ccommodate multiple dependent distributions. • Methods • Inducing dependence through a shared source. For example, the dependent Dirichlet process (DDP) and the hierarchical Dirichlet process (HDP). • Inducing dependence through linear combinations of realizations of independent Dirichlet processes. For example, Muller (2004) defines the distribution of each group as the mixture of a global component and a local component.

Background • The paper is motivated by two related problems: clustering probability distributions and simultaneous multilevel clustering in nested setting. • Considered an example of hospital analysis: • In assessing quality of care, we need cluster centers according to the distribution of patients outcomes and identify outlying centers. • Also want to simultaneously cluster patients within the centers, and borrow information across centers that have similar clusters.

The Dirichlet process • A single clustering problem can be analyzed as a Dirichlet processes (DP). The stick-breaking construction is usually the starting point of analysis: • If yields Pitman-Yor process. If a = 0 and b = a resulting in the standard DP.

The nested Dirichlet process mixture • Suppose yij, for i = 1, …, nj are observations within center j. We assume exchangeability for centers, with • A collection of distributions {F1, …, FJ} is said to follow a Nested Dirichlet Processes Mixture if

The nested Dirichlet process • The collection {G1, …, GJ}, used as the mixing distribution, is said to follow a Nested Dirichlet Process with parameters • From the construction, we have , and marginally, for every j. • We have the properties for each Gj with

Prior correlation • The prior correlation between two distribuitions Gj and Gj’ is • The prior correlation between draws from the process is • The correlation within center is larger than the one between centers. • Generalized to three standard cases when

Truncations

Truncation error example for nDP(3,3,H) • As the number of groups J increases, K needs to be increased. A typical choice will be K = 35 and L = 55;

Sampling by double truncation

Simulated data • Showing the discriminating capability of the nDP and its ability to provide more accurate density estimates.

Density estimation result • Case (a) – using the nDP and Case (b) – using the DPM. • The nDP captures the small mode better and also emphasizes the importance of the main mode. • Entropy of the estimation (red) to the true distribution (black) under the nDP is 0.011, while under the DMP it was 0.017.

Health care quality in United States • Data – 3077 hospitals in 51 territories (50 states + DC). Number of hospitals per state varies as well as the number of patients per hospital vary. • Four covariates are available for each center: type of hospital, ownership, whether the hospital provides emergency services and whether it has an accreditation. • We are interested in clustering states according to their quality. After adjusting for the effect of available covariates, we getting the main-effects ANOVA and use the nDP to model the state-specific error distributions.

Conclusion • The author proposed the nested Dirichlet process to simultaneously cluster groups and observations within groups. • The groups are clustered by their entire distribution rather than by particular features of it. • While being non-parametric, the nDP encompasses a number of typical parametric and non-parametric models as limiting cases.

The Nested Dirichlet Process

The Nested Dirichlet Process

Presentation Transcript

Latent Dirichlet Allocation

Collapsed Variational Dirichlet Process Mixture Models

Hierarchical Topic Models and the Nested Chinese Restaurant Process

Hierarchical Dirichlet Process and Infinite Hidden Markov Model

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Applications of Dirichlet Process Mixtures to Speaker Adaptation

Hierarchical Dirichlet Process (HDP)

Variational Inference for Dirichlet Process Mixture

Exact and Approximate Sum Representations for the Dirichlet Process

Peter G.L. Dirichlet

Double Dirichlet Process Mixtures

Hierarchical Dirichlet Processes

Dirichlet :

Dirichlet process tutorial

Generalized Spatial Dirichlet Process Models

Hierarchical Dirichlet Process (HDP)

Latent Dirichlet Allocation

Latent Dirichlet Allocation

Dirichlet Process Mixtures A gentle tutorial

Hierarchical Double Dirichlet Process Mixture of Gaussian Processes

12.1 The Dirichlet conditions:

Double Dirichlet Process Mixtures