Bayesian Robust Principal Component Analysis

Reading Group Bayesian Robust Principal Component Analysis (Xinghao Ding, Lihan He, and Lawrence Carin) Presenter:RaghuRanganathan ECE / CMR Tennessee Technological University January 21, 2011

Paper contribution The problem of matrix decomposition into low-rank and sparse components is considered employing a hierarchical approach The matrix is assumed noisy, with unknown and possibly non-stationary noise statistics The Bayesian framework approximately infers the noise statistics in addition to the low-rank and sparse outlier contributions The model proposed is robust to a broad range of noise levels without having to change the hyper-parameter settings In addition, a Markov dependency between successive rows of the matrix is inferred by the Bayesian model to exploit additional structure in the observed matrix, particularly, in video applications

Introduction Most high-dimensional data such as images, biological data, and social network data (Netflix data) reside in a low-dimensional subspace or low-dimensional manifold

Noise models In low-rank matrix representations, two types of noise models are usually considered One causes small scale perturbation to all the matrix elements, e.g. i.i.d. Gaussian noise added to each element. In this case, if the noise energy is small compared to the dominant singular values of the SVD, it does not significantly affect the principal vectors The second case is sparse noise with arbitrary magnitude, impacting a small subset of matrix elements, for example a moving object in video, in the presence of a static background manifests such sparse noise

Convex optimization approach

Bayesian approach The observation matrix is considered to be of the form Y = L (low-rank)+ S (sparse)+ E (noise), with the presence of both sparse noise, S, and dense noise E. In the proposed Bayesian model, the noise statistics of E are approximately learned, along with learning S, and L. The proposed model is robust to a broad range of noise variances The Bayesian model infers approximation to the posterior distributions on the model parameters, and obtains approximate probability distributions for L, S, and E The advantage of Bayesian model is that prior knowledge is employed in the inference

Bayesian approach The Bayesian framework exploits the anticipated structure in the sparse component. In video analysis, it is desired to separate the spatially localized moving objects (sparse component), from the static or quasi-static background (low-rank component) in the presence frame dependent additive noise E. The correlation between the sparse components of the video from frame to frame (column to column in the matrix) has to be considered In this paper, a Markov dependency in time and space is assumed between the sparse components of consecutive matrix columns This structure is incorporated into the Bayesian framework, with the Markov parameters inferred through the observed matrix

Bayesian Robust PCA The work in this paper is closely related to the low-rank matrix completion problem where we try to approximate a matrix (with noisy entries) by a low-rank matrix and to predict the missing entries The matrix Y = L + S + E is missing random entries; the proposed model can make estimates for the missing entries (in terms of the low-rank term L) The S term is defined as a sparse set of matrix entries; the location of S must be inferred while estimating the values of L, S, and E Typically, in Bayesian inference, a sparseness promoting prior is imposed on the desired signal, and the posterior distribution of the sparse signal is inferred.

Bayesian Low-rank and Sparse Model

C. Noise component The measurement noise is drawn i.i.d from a Gaussian distribution, and the noise affects all measurements The noise variance is assumed unknown, and is learned within the model inference. Mathematically, the noise is modeled as The model can learn different noise variances for different parts of E, i.e. each column/row of Y (each frame) in general have its own noise level. The noise structure is modified as

Relation to the optimization based approach

Relation to the optimization based approach In the Bayesian model, it is not required to know the noise variance a priori, the model will learn the noise during inference For the low-rank component instead of the constraint to impose sparseness of singular values, the Gaussian prior together with the beta-Bernoulli distribution is used to obtain an constraint For the sparse component, instead of the constraint, the constraint and the beta-Bernoulli distribution is employed to enforce sparsity Compared to the Laplacian prior (gives many small entries close to 0), the beta-Bernoulli prior yields exactly zero values In Bayesian learning, numerical methods are used to estimate the distribution for the unknown parameters, whereas in the optimization based approach, a solution to the minimum of a function similar to

Markov dependency of Sparse Term in Time and Space

Posterior inference

Experimental results

B. Video example The application of video surveillance with a fixed camera is considered The objective is to reconstruct a near static background and moving foreground from a video sequence The data are organized such that the column m of Y is constructed by concatenating all pixels of frame m from a grayscale video sequence The background is modeled as the low-rank component, and the moving foreground as the sparse component. The rank r is usually small for a static background, and the sparse components across frames (columns of Y) are strongly correlated, modeled by a Markov dependency

Conclusions The authors have developed a new robust Bayesian PCA framework for analysis of matrices with sparsely distributed noise of arbitrary magnitude The Bayesian approach is found to be robust to densely distributed noise, and the noise statistics may be inferred based on the data, with no tuning of hyperparameters In addition, using the Markov property, the model allows the noise statistics to vary from frame to frame Future research directions would involve a moving camera which would assume the background resides in a low-dimensional manifold as opposed to low-dimensional linear subspace The Bayesian framework may be extended to infer the properties of the low-dimensional manifold

Bayesian Robust Principal Component Analysis

Bayesian Robust Principal Component Analysis

Presentation Transcript

Principal component analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Detect Road Traffic Events by Bayesian Robust Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Sparsity Control for Robust Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Sparsity Control for Robust Principal Component Analysis