710 likes | 914 Views
Presented by Patrick Dallaire – DAMAS Workshop november 2 th 2012. Hierarchical Double Dirichlet Process Mixture of Gaussian Processes. Paper from Tayal et al . (2012) AAAI. INTRODUCTION. PROBLEM DESCRIPTION. Consider a non-stationary time series such as:. PROPOSED MODELS.
E N D
Presented by Patrick Dallaire – DAMAS Workshop november 2th 2012 Hierarchical Double Dirichlet Process Mixture of Gaussian Processes Paper from Tayal et al. (2012) AAAI
PROBLEM DESCRIPTION • Consider a non-stationary time series such as:
PROPOSED MODELS • Gaussian processes • Infinite mixture of Gaussian processes • Dirichlet process mixture of Gaussian processes • Hierarchical double Dirichlet process mixture of Gaussian processes
OUTLINE • Bayesianmodeling • Dirichlet processes • Hierarchical Dirichlet processes • Gaussianprocesses
THE BAYESIAN APPROACH • Define a model linking the unknown parameters to the data: • Specify a prior probability distribution expressing our belief about the parameters: • Compute the posterior distribution of the parameters given the data withBayes' theorem:
COMPUTATIONAL ISSUES • Bayes' theorem involves different quantities: • The shape of the posterior is given by: • The marginal likelihood is used as a normalizing constant:
POSTERIOR PREDICTION • The predictive distribution can be formulated as: • Predictions should consider the all posterior uncertainty about the parameters:
CONJUGATE MODELS • Integrals involved in Bayesian inference can be analytically intractable, increasing the computational complexity. • A model is said conjugate when the posterior and prior distributions belong to the same family. • Posterior computation for conjugate models is done analytically.
INTRODUCTION • Gaussianprocesses (GP) are used for supervisedlearning to estimate a function of interest • GPs are probability distributions over space of functions • They belong to the class of nonparametric Bayesian approaches
NORMAL DISTRIBUTION • Let us assume a random variable
NORMAL DISTRIBUTION • We place the random variable such as:
INDEXING RANDOM VARIABLES • Assume multiple variables indexed by and placed at :
MULTIVARIATE NORMAL • According to this construction, we have a set of i.i.d. normally distributed random variables • The joint probability can be represented as: • What happens when adding covariance?
MULTIVARIATE NORMAL • An example with dependent variables
INFINITE NORMAL • Assume that random variables are now indexed by input values in • Since this space is covered by normal variables, infinitely many normal variables • Let us denote by the normal at • We must define how these variables covary
DEFINITION • A Gaussian process is a set of random variables for which any subset of its variable has a multivariate normal joint distribution • To specify a prior distribution in a space of functions, we define:whereis the meanfunction and is the covariance function
PRIOR OVER FUNCTIONS • Specifying a GP consists in defining its mean and covariance functions • The covariance function determines a likelihood over the different types of functions
DIRICHLET PROCESS • A Dirichlet process (DP) is a distribution over discrete distributions denoted as: • The parameter is the base distribution and is the concentration parameter • Sampling a DP can be done according to:
DIRICHLET PROCESS • A Dirichlet process (DP) is a distribution over discrete distributions denoted as: • The parameter is the base distribution and is the concentration parameter • Sampling a DP can be done according to:
CLUSTERING PROPERTY • A random draw from a Dirichlet process is discrete with probability one • Only a finite number of its atoms will have an appreciable mass • A data point from the random distributionis associated to cluster with probability
INTRODUCTION • Gaussian processes using a stationary set of hyperparametersmay be too restrictive • Dirichlet process can be used to group the observed data into clusters • Each cluster could be given a private set of hyperparameters representing the local behavior of the function
GENERATIVE PROCESS • Partition the data into clusters with the Dirichlet process • For each cluster, sample an input Gaussian and a set of hyperparameters • For each data in a cluster, sample its input position according to the input Gaussian • Sample output variables according to the respective Gaussian process
GENERATIVE PROCESS • Partition the data into clusters with the Dirichlet process • For each cluster, sample an input Gaussian and a set of hyperparameters • For each data in a cluster, sample its input position according to the input Gaussian • Sample output variables according to the respective Gaussian process
EXAMPLE Popularity Data cluster cluster