Hierarchical Double Dirichlet Process Mixture of Gaussian Processes

Presented by Patrick Dallaire – DAMAS Workshop november 2th 2012 Hierarchical Double Dirichlet Process Mixture of Gaussian Processes Paper from Tayal et al. (2012) AAAI

INTRODUCTION

PROBLEM DESCRIPTION • Consider a non-stationary time series such as:

PROPOSED MODELS • Gaussian processes • Infinite mixture of Gaussian processes • Dirichlet process mixture of Gaussian processes • Hierarchical double Dirichlet process mixture of Gaussian processes

OUTLINE • Bayesianmodeling • Dirichlet processes • Hierarchical Dirichlet processes • Gaussianprocesses

Bayesian MODELING

THE BAYESIAN APPROACH • Define a model linking the unknown parameters to the data: • Specify a prior probability distribution expressing our belief about the parameters: • Compute the posterior distribution of the parameters given the data withBayes' theorem:

COMPUTATIONAL ISSUES • Bayes' theorem involves different quantities: • The shape of the posterior is given by: • The marginal likelihood is used as a normalizing constant:

POSTERIOR PREDICTION • The predictive distribution can be formulated as: • Predictions should consider the all posterior uncertainty about the parameters:

CONJUGATE MODELS • Integrals involved in Bayesian inference can be analytically intractable, increasing the computational complexity. • A model is said conjugate when the posterior and prior distributions belong to the same family. • Posterior computation for conjugate models is done analytically.

GAUSSIAN PROCESSES

INTRODUCTION • Gaussianprocesses (GP) are used for supervisedlearning to estimate a function of interest • GPs are probability distributions over space of functions • They belong to the class of nonparametric Bayesian approaches

NORMAL DISTRIBUTION • Let us assume a random variable

NORMAL DISTRIBUTION • We place the random variable such as:

INDEXING RANDOM VARIABLES • Assume multiple variables indexed by and placed at :

MULTIVARIATE NORMAL • According to this construction, we have a set of i.i.d. normally distributed random variables • The joint probability can be represented as: • What happens when adding covariance?

MULTIVARIATE NORMAL • An example with dependent variables

INFINITE NORMAL • Assume that random variables are now indexed by input values in • Since this space is covered by normal variables, infinitely many normal variables • Let us denote by the normal at • We must define how these variables covary

DEFINITION • A Gaussian process is a set of random variables for which any subset of its variable has a multivariate normal joint distribution • To specify a prior distribution in a space of functions, we define:whereis the meanfunction and is the covariance function

SAMPLING EXAMPLE

PRIOR OVER FUNCTIONS • Specifying a GP consists in defining its mean and covariance functions • The covariance function determines a likelihood over the different types of functions

LEARNING EXAMPLE

DIRICHLET PROCESSES

DIRICHLET PROCESS • A Dirichlet process (DP) is a distribution over discrete distributions denoted as: • The parameter is the base distribution and is the concentration parameter • Sampling a DP can be done according to:

STICK-BREAKING CONSTRUCTION

DIRICHLET PROCESS • A Dirichlet process (DP) is a distribution over discrete distributions denoted as: • The parameter is the base distribution and is the concentration parameter • Sampling a DP can be done according to:

CLUSTERING PROPERTY • A random draw from a Dirichlet process is discrete with probability one • Only a finite number of its atoms will have an appreciable mass • A data point from the random distributionis associated to cluster with probability

DIRICHLET PROCESS MIXTURE OF GAUSSIAN PROCESSES

INTRODUCTION • Gaussian processes using a stationary set of hyperparametersmay be too restrictive • Dirichlet process can be used to group the observed data into clusters • Each cluster could be given a private set of hyperparameters representing the local behavior of the function

GENERATIVE PROCESS • Partition the data into clusters with the Dirichlet process • For each cluster, sample an input Gaussian and a set of hyperparameters • For each data in a cluster, sample its input position according to the input Gaussian • Sample output variables according to the respective Gaussian process

EXAMPLE Popularity Data cluster cluster

Hierarchical Double Dirichlet Process Mixture of Gaussian Processes

Hierarchical Double Dirichlet Process Mixture of Gaussian Processes

Presentation Transcript

Collapsed Variational Dirichlet Process Mixture Models

Hierarchical Dirichlet Process and Infinite Hidden Markov Model

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Hierarchical Dirichlet Process (HDP)

Variational Inference for Dirichlet Process Mixture

Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture

Double Dirichlet Process Mixtures

Hierarchical Dirichelet Processes

Gaussian Mixture Model

Hierarchical Dirichlet Processes

Dirichlet process tutorial

Gaussian Mixture Models

Hierarchical Dirichlet Process (HDP)

Univariate Gaussian Mixture Model

Construction of Dependent Dirichlet Processes based on Poisson Processes

Memoized Online Variational Inference for Dirichlet Process Mixture Models

The Nested Dirichlet Process

Hierarchical Mixture of Experts

Gaussian Mixture Models and Expectation Maximization

Double Dirichlet Process Mixtures

Gaussian Mixture Models