Learning with Tree-averaged Densities and Distributions

Learning with Tree-averaged Densities and Distributions Sergey Kirshner Alberta Ingenuity Centre for Machine Learning, Department of Computing Science, University of Alberta, Canada NIPS 2007 Poster W12 December 5, 2007

Overview • Want to fit density to complete multivariate data • New density estimation model based on averaging over tree-dependence structures • Distribution = Univariate Marginals + Copula • Bayesian averaging over tree-structured copulas • Efficient parameter estimation for tree-averaged copulas • Can solve problems with 10-30 dimensions Learning with Tree-averaged Densities and Distributions

Most Popular Distribution… • Interpretable • Closed under taking marginals • Generalizes to multiple dimensions • Models pairwise dependence • Tractable • 245 pages out of 691 from Continuous Multivariate Distributions by Kotz, Balakrishnan, and Johnson Learning with Tree-averaged Densities and Distributions

What If the Data Is NOT Gaussian? Learning with Tree-averaged Densities and Distributions

1/n 1/n Curse of Dimensionality [Bellman 57] nd cells V[-2,2]d ≈ 0.9545d Learning with Tree-averaged Densities and Distributions

Avoiding the Curse: Step 1Separating Univariate Marginals univariate marginals, independent variables, multivariate dependence term, copula Learning with Tree-averaged Densities and Distributions

Monotonic Transformation of the Variables Learning with Tree-averaged Densities and Distributions

Copula Copula C is a multivariate distribution (cdf) defined on a unit hypercube with uniform univariate marginals: Learning with Tree-averaged Densities and Distributions

Sklar’s Theorem [Sklar 59] = + Learning with Tree-averaged Densities and Distributions

Example: Bivariate Gaussian Copula Learning with Tree-averaged Densities and Distributions

Useful Properties of Copulas • Preserves concordance between the variables • Rank-based measure of dependence • Preserves mutual information • Can be viewed as a canonical form of a multivariate distribution for the purpose of the estimation of multivariate dependence Learning with Tree-averaged Densities and Distributions

Copula Density Learning with Tree-averaged Densities and Distributions

Separating Univariate Marginals • Fit univariate marginals (parametric or non-parametric) • Replace data points with cdf’s of the marginals • Estimate copula density Inference for the margins [Joe and Xu 96]; canonical maximum likelihood [Genest et al 95] Learning with Tree-averaged Densities and Distributions

What Next? • Aren’t we back to square one? • Still estimating multivariate density from data • Not quite • All marginals are fixed • Lots of approaches for copulas • Vast majority focus on bivariate case • Design models that use only pairs of variables Learning with Tree-averaged Densities and Distributions

x1 x2 x6 x3 x5 x4 Tree-Structured Densities Learning with Tree-averaged Densities and Distributions

Tree-Structured Copulas Learning with Tree-averaged Densities and Distributions

a4 a4 a2 a1 a2 a1 a3 a2 a3 a1 a3 a4 Chow-Liu Algorithm (for Copulas) A1A2 A1A3 A1A4 A2A3 A2A4 A3A4 A1A2 A1A3 A1A4 A2A3 A2A4 A3A4 c(a1,a2) c(a1,a3) c(a1,a4) c(a2,a3) c(a2,a4) c(a3,a4) c(a1,a2) c(a1,a3) c(a1,a4) c(a2,a3) c(a2,a4) c(a3,a4) 0.3126 0.0229 0.0172 0.0230 0.0183 0.2603 0.3126 0.0229 0.0172 0.0230 0.0183 0.2603 Learning with Tree-averaged Densities and Distributions

b12 a1 a2 b13 b12 b12 a1 a1 a2 a2 b14 b23 b13 b13 b24 b14 b14 b23 b23 a4 a3 b34 b24 b24 a4 a4 a3 a3 b34 b34 Distribution over Spanning Trees [Meilă and Jaakkola 00, 06] O(d3) !!! Learning with Tree-averaged Densities and Distributions

Tree-Averaged Copula • Can compute sum over all dd-2 spanning trees • Can be viewed as a mixture over many, many spanning trees • Can use EM to estimate the parameters • Even though there are dd-2 mixture components! Learning with Tree-averaged Densities and Distributions

EM for Tree-Averaged Copulas • E-step: compute • Can be done in O(d3)per data point • M-step: update b and Q • Update of Q isoften linear in the number of points • Gaussian copula: solving cubic equation • Update of b is essentially iterative scaling • Can be done in O(d3) per iteration Intractable!!! Learning with Tree-averaged Densities and Distributions

Experiments: Log-Likelihood on Test Data UCI ML Repository MAGIC data set 12000 10-dimensional vectors 2000 examples in test sets Average over 10 partitions Learning with Tree-averaged Densities and Distributions

Binary-Continuous Data Learning with Tree-averaged Densities and Distributions

Summary • Multivariate distribution = univariate marginals + copula • Copula density estimation via tree-averaging • Closed form • Tractable parameter estimation algorithm in ML framework (EM) • O(Nd3) per iteration • Only bivariate distributions at each estimation • Potentially avoiding the curse of dimensionality • New model for multi-site rainfall amounts (POSTER W12) Learning with Tree-averaged Densities and Distributions

Learning with Tree-averaged Densities and Distributions

Learning with Tree-averaged Densities and Distributions

Presentation Transcript

Decision Tree Learning

Decision Tree Learning

Learning with Tree-averaged Densities and Distributions

Algorithms and genotype densities

Decision Tree Learning

Regression Tree Learning

Decision Tree Learning

Learning Tree Structures

Learning and testing k-modal distributions

Decision Tree Learning

Nonparametric estimation of phylogenetic tree distributions

Decision Tree Learning

Decision Tree Learning

Decision tree learning

Nonparametric estimation of phylogenetic tree distributions

PROJECT LEARNING TREE

Decision Tree Learning

Learning Tree Structures