180 likes | 193 Views
This paper explores the use of Variational Bayesian Inference and Gaussian Process Factor Analysis (GPFA) for modeling spatio-temporal data. The authors demonstrate the applications of factor analysis in various fields, including dimensionality reduction, dictionary learning, feature selection, matrix completion, and spatial dynamic data analysis.
E N D
Variational Gaussian-process factor analysis for modeling spatio-temporal data Jaakko Luttinen and Alexander Ilin NIPS 2009 Presented by Bo Chen 2.26, 2010
Outline • Introduction---- Factor Analysis (FA) • Introduction--- Gaussian Process (GP) • Spatio-Temporal Factor Analysis • Factor Analysis with GP prior (GPFA) • Variational Bayesian Inference • Speeding up GPFA • Experiments
The Applications of Factor Analysis • 1. Dimensionality Reduction • 2. Dictionary Learning (Denoising and Impainting) • 3. Feature Selection (Gene Analysis) • 4. Matrix Completion (Regression) • 5. Spatial Dynamic Data Analysis • …. Uncover the prominent structure from the data
Introduce the extra information from the input space Gaussian Process A joint Gaussian distribution over sets of function values {fx} of any arbitrary set of n instances x Pros: • Utilize the extra information from the input space • Nonlinearity Cons: • Computational Complexity Probability distribution over functions
Time information Time information Spatial information Spatio-Temporal Factor Analysis W:d: A factor vector spatially distributed Xd:: Time seires of factor d (M. N. Schmidt.,ICML 2009) The m-th row of Y corresponds to a spatial location lm (e.g., a location on a two dimensional map) and the n-th column corresponds to a time instance tn
Introduce Gaussian Process Prior Each time signal xd: contains values of a latent function X(t) computed at time instances tn. Each spatial signal w:d contains measurements of a function W(l) at different locations lm. The likelihood function of the observed data:
Variational Bayesian Inference The approximation of the true posterior: The lower bound of the marginal log-likelihood: Maximizing the lower bound, we can get
Inferred Posterior Where Z: is a DNx1 vector formed by concatenation of vectors: U is a DNxDN block-diagonal matrix with the following DxD matrices on the diagonal: In the paper, the author assume an isotropic noise:
Speeding Up GPFA (1) • Component-Wise Factorization
and We will get Speeding Up GPFA (2) A set of auxiliary variables which contain the values of latent functions Wd(l), Xd(t) in some locations • Inducing the inputs If the inducing inputs summarize the data well, The approximate posterior: Maximizing the new variational lower bound Some VB update details can be found in this paper and M. K. Titsias., AISTATS’09.
Artificial Experiments M=30 sensors (two-dimensional spatial locations) N=200 time instances D=4 temporal signals xd: generated by taking samples from GP priors with different covariance kernels, see next page. The loadings were generated from GPs over the two-dimensional space using the squared exponential covariance kernel. Data Y: 452 points are selected as observed and the remaining ones as missing. The hyperparameters of the Gaussian processes were initialized randomly close to the values used for data generation, assuming that a good guess about the Hidden signals can be obtained by exploratory analysis of data.
Covariance Kernels • Squared exponential function to model a slowly changing component: • Periodic function with decay to model a quasi-periodic component: • Compactly supported piecewise polynomial function to model two fast changing components with different time scales • Squared exponential to model the spatial information
Reconstruction of Global SST Using the MOHSST5 Dataset The authors demonstrate how the presented model can be used to reconstruct global sea surface temperatures (SST) from historical measurements. Data Description: 1: U.K. Meteorological Office historical SST data set that contain monthly SST anomalies in the 1856-1991 period for 50x50 longitude-latitude bins. 2. The dataset contains in total approximately 1600 time instances and 1700 spatial locations. 3. The dataset is sparse, especially during the 19th century and the World Wars, having 55% of the values missing, and thus, consisting of more than 106 observations in total. Available at http://iridl.ldeo.columbia.edu/SOURCES/.KAPLAN/.RSA_MOHSST5.cuf/.OS/.ssta/?help+datafiles
Covariance Kernels: • Five time signals xd: to describe climate trends: the squared exponential • kernel. • 2. Five temporal components to capture periodic signals: quasi-periodic kernel • 3. Five components to model prominent interannual phenomena such as • El Nino: squared exponential kernel • 4. The rest 65 time signals: piecewise polynomial kernel • 5. Spatial pattern w:d: scaled squared exponential. The distance r between the • locations li and lj was measured on the surface of the Earth using the spherical • law of cosines. Inducing inputs: 1. Each spatial function wd(l): 500 inducing inputs 2. 15 temporal functions X(t) which modeled slow climate variability: (1) the slowest: 80; (2) quasi-periodic: 300; (3) interannual: 300 3. The remaining temporal phenomena: priors with a sparse covariance matrix and therefore allow efficient computations. 4. Taking a random subset from the original inputs and then kept fixed throughout learning Experimental Methodology Factor number: D=80 Training set: 20%; Testing set: 80%
Results El Nino Reconstruction Error: 0.5714 El Nino Reconstruction Error: 0.6180
Conclusions • 1. Gaussian Process factor analysis used for modeling spatio-temporal phenomena on different scales by using properly selected GPs. • 2. Infer the parameters using variational Bayesian so as to take into account the uncertainty about the unknown parameters • 3. Use all available data and combine all modeling assumptions in one estimation procedure