Bayesian Generalized Product Partition Model

Bayesian Generalized Product Partition Model By David Dunson and Ju-Hyun Park Presentation by Eric Wang 2/15/08

Outline • Introduce Product Partition Models (PPM). • Relate PPM to DP via the Blackwell-MacQueen Polya Urn scheme. • Introduce predictor dependence into PPM to form Generalized PPM (GPPM). • Discussion and Results • Conclusion

Product Partition Model • A PPM is formally defined as • Where is a partition of . • Let denote the data for subjects in cluster h, h = 1,…,k. • Therefore, the probability of partition is therefore the product of all its independent subsets. • The posterior cohesion on after seeing data is also a PPM, (1)

Product Partition Model • A PPM can also be induced hierarchically • Where if , . • Taking induces a nonparametric PPM. • A prior on the weights imposes a particular form on the cohesion: a convenient choice corresponds to the Dirichlet Process.

Relating DP and PPM • In DP, . • G is seen in stick breaking. If it is marginalized out, it yields the Blackwell-MacQueen (1973) formulation: • Where is the unique value taken by the ith data. • The joint distribution of the a particular set is therefore due to the independence of the data.

Relating DP and PPM • It can be shown directly that the Blackwell-MacQueen formulation leads to • Where is the number of data taking unique value . • is the unique value of the subject in cluster h, re-sorted by their ids: • Also, , is a normalizing constant and the cohesion is Then: (2) (3)

Relating DP and PPM • From slide 3, writing the prior and likelihood together: • Notice that from (1), G can be marginalized out to get the same form • Specifically, integrate over all possible unique values which can be taken by for subset h. (4)

Relating DP and PPM • Therefore, DP is a special case of PPM with cohesion and normalizing constant . • However, (2) follows the premise of DP that data is exhcangeable and does not incorporate dependence on predictors. • Next, PPMs will be generalized such that predictor dependence is incorporated.

Generalized PPM • The goal of the paper is to formulate (1) such that the cohesion depends on the subject’s predictor: • This can be done following a process very similar to the non-predictor case above. • Once again, the connection between DP and PPM will be used, this will henceforth be referred to as GPPM • The formulation is interesting because the predictors will be treated as random variables rather than known fixed values (as in KSBP).

GPPM • Consider the following hierarchical model • Where , constitutes a base measure on and , the parameters of the data and predictor, respectively. • This model will segment data {1,…,n} into k clusters. As before, denotes that subject i belongs to cluster h. • and , which denote the unique values of the parameters associated with the subject and its predictor, shown below

GPPM (5) • The joint distribution of can be developed in a similar manner to (2): • The conditional distribution of given predictors is • For comparison, (2) is shown below: • The cohesion in (6) is • (7) meets the criteria originally set out. (6) (2) (7)

GPPM • Some thoughts on GPPM so far: • As noted earlier the posterior distribution of PPMs are still in the class of PPMs, but with updated cohesion. • Similiarly, the posterior of a GPPM will also take the form of a GPPM • (2) and (6) are quite similar. The extra portion of (6) is the marginalized probability of the predictor . • If , then the GPPM reverts to the Blackwell-MacQueen formulation, seen clearly in the following theorem.

Generalized Polya Urn Scheme • The following theorem shows that the GPPM can induce a Blackwell-MacQueen Polya Urn scheme, generalized for predictor dependence:

Generalized Polya Urn Scheme • By the above theorem, data i will do either 1) or 2) • 1) Draw a previously unseen unique value proportional to the concentration parameter and the base measure on the predictor • 2) Draw a previously used unique value equal to the parameters of cluster h proportional to the number of data which have previously chosen that unique value and the marginal likelihoods of its predictor value across the clusters. • Further, since the predictors are treated as random variables, updating the posteriors on each cluster’s predictor parameters means that GPPM is a flexible, non-parametric way to adapt the distance measure in predictor space. • In this paper G is always integrated out; however, Dunson alludes to variational techniques which could still be developed in similar fashion following the fast Variational DP proposed by Kurihara et al (2006).

Generalized Polya Urn Scheme • Consider, for example, a Normal-Wishart prior on the predictor as follows • Where and are multiplicative constants and is a Wishart distribution with degrees of freedom and mean • Notice that this formulation adds another multiplier to the precision of the predictor distribution. This analogously corresponds to kernel width in KSBP, and encourages tight local clustering in predictor space. • The marginal distributions on the predictors from Theorem 1 take the forms shown on the next slide.

Generalized Polya Urn Scheme • The marginal distribution of the predictor in the first weight: • The marginal distribution of the predictor in the second weight has the same functional form but with updated hyperparameters: Non-central multivariate t-distribution with degrees of freedom Mean and scale where And is the empirical mean of the predictors in cluster h, without predictor i.

Generalized Polya Urn Scheme • Posterior updating in this model is straightforward using MCMC. The conditional posterior of the parameters is • The indicators are updated separately from the cluster parameters . The membership indicators are sampled from it multinomial posterior: • Next, update the parameters conditioned on and number of clusters k. where is the base prior updated with the data likelihood and the weights from Theorem 1

Results • Dunson et al. demonstrates results using the following model on conditional density regression problems • Where • Demonstrate results on 3 datasets: • Simulated Single Gaussian (p = 2) • Simulated Mixture of two Gaussians (p = 2) • Epidemiology data (p = 3) P-dimensional predictor Data likelihood Parameters of cluster h.

Results • Simulated single Gaussian data, 500 data points • is generated iid from a uniform distribution over (0,1). • Data was simulated using • Algorithm was run for 10,000 iterations with 1,000 iteration burn-in. Fast mixing and good estimates. Raw Data Below are conditional distributions on y for two different values of x. The dotted lines is truth, the solid line is the estimation, and the dashed lines are 99% credibility intervals y x

Results • Simulated 2 Gaussian results, 500 data points • is generated iid from a uniform distribution over (0,1). • Data was simulated using PPM GPPM Here, the left column of plots are for a PPM (non-generalized, while the right column plots is the GPPM on the same dataset. Notice much better fitting in the bottom plots, and that the GPPM is not dragged toward 0 as the second peak appears when approaches 0.

Results • Epidemiologic Application: • DDE is shown to increase the rate of pre-term birth. Two predictors and correspond to DDE dose for child i, and mother’s age after normalization, respectively. • Dataset size was 2,313 subjects. • MCMC GPPM was run for 30,000 iterations with 10,000 iteration burn-in. • The results confirmed earlier findings that DDE causes a slightly decreasing trend as DDE level rises. • These findings are similar to previous KSBP work on the same dataset, but the implementation was simpler.

Results Raw Data Dashed lines indicate 99% credibility intervals

Conclusion • A GPPM was formulated beginning with the Blackwell-MacQueen Polya Urn scheme. • The GPPM incorporates predictor dependence by treating the predictor as a random variable. • It is similar in spirit to the KSBP, but is able to bypass issues such as kernel width selection and the inability to implement a continuous distribution in predictor space. • Future research directions could explore Dunson’s mention of a variational method similar to the formulation proposed in this paper.

Bayesian Generalized Product Partition Model

Bayesian Generalized Product Partition Model

Presentation Transcript

The Potts Model Partition Function:

Generalized Linear Mixed Model

Generalized Model for Program Planning

Generalized Vector Model

Beyond the Generalized Linear Mixed Model: a Hierarchical Bayesian Perspective

Turning Bayesian Model Averaging Into Bayesian Model Combination

Bayesian Generalized Kernel Mixed Models

Bayesian SPARROW Model

Bayesian Generalized Product Partition Model

Generalized Regression Model

Generalized Linear Model

Generalized Vector Space Model

Bayesian Model Robust and Model Discrimination Designs

Generalized Exemplar Model of Sampling

JEFS Calibration: Bayesian Model Averaging

A Generalized Stackelberg Model

Bayesian Model Comparison

Bayesian Methods II: Model Comparison

Generalized Linear Model (GZLM): Overview

COSMOLOGICAL BAYESIAN MODEL SELECTION

Bayesian Model Comparison