Matrix Factorization with Unknown Noise

Matrix Factorization with Unknown Noise DeyuMeng • 参考文献： • DeyuMeng, Fernando De la Torre. Robust Matrix Factorization with Unknown Noise. International Conference of Computer Vision (ICCV), 2013. • Qian Zhao, DeyuMeng, ZongbenXu, WangmengZuo, Lei Zhang. Robust principal component analysis with complex noise, International Conference of Machine Learning (ICML), 2014.

Low-rank matrix factorization are widely used in computer vision. Structure from Motion Photometric Stereo (E.g., Zheng et al.,2012) (E.g.,Eriksson and Hengel ,2010) Face Modeling Background Subtraction (E.g., Candes et al.,2012) (E.g. Candes et al.,2012)

Complete, clean data (or with Gaussian noise) • SVD: Global solution

Complete, clean data (or with Gaussian noise) • SVD: Global solution • There are always missing data • There are always heavy and complex noise

L2 norm model • Young diagram (CVPR, 2008) • L2 Wiberg (IJCV, 2007) • LM_S/LM_M (IJCV, 2008) • SALS (CVIU, 2010) • LRSDP (NIPS, 2010) • Damped Wiberg (ICCV, 2011) • Weighted SVD (Technometrics, 1979) • WLRA (ICML, 2003) • Damped Newton (CVPR, 2005) • CWM (AAAI, 2013) • Reg-ALM-L1 (CVPR, 2013) Pros: smooth model, faster algorithm, have global optimum for non-missing data Cons: not robust to heavy outliers

L1 norm model L2 norm model • Torre&Black (ICCV, 2001) • R1PCA (ICML, 2006) • PCAL1 (PAMI, 2008) • ALP/AQP (CVPR, 2005) • L1Wiberg (CVPR, 2010, best paper award) • RegL1ALM (CVPR, 2012) • Young diagram (CVPR, 2008) • L2 Wiberg (IJCV, 2007) • LM_S/LM_M (IJCV, 2008) • SALS (CVIU, 2010) • LRSDP (NIPS, 2010) • Damped Wiberg (ICCV, 2011) • Weighted SVD (Technometrics, 1979) • WLRA (ICML, 2003) • Damped Newton (CVPR, 2005) • CWM (AAAI, 2013) • Reg-ALM-L1 (CVPR, 2013) Pros: smooth model, faster algorithm, have global optimum for non-missing data Cons: not robust to heavy outliers Pros: robust to extreme outliers Cons: non-smooth model, slow algorithm, perform badly in Gaussian noise data

L2 model is optimal to Gaussian noise • L1 model is optimal to Laplacian noise • But real noise is generally neither Gaussian nor Laplacian

Yale B faces: … Camera noise Saturation and shadow noise

We propose Mixture of Gaussian (MoG) Universal approximation property of MoG Any continuous distributions MoG (Maz’ya and Schmidt, 1996) • E.g., a Laplace distribution can be equivalently expressed as a scaled MoG (Andrews and Mallows, 1974)

MLE Model • Use EM algorithm to solve it!

E Step: • M Step:

Synthetic experiments • Three noise cases • Gaussian noise • Sparse noise • Mixture noise • Six error measurements What L2 and L1 methods optimize Good measures to estimate groundtruth subspace

L1 methods Our method L2 methods Gaussian noise experiments • MoG performs similar with L2 methods, better than L1 methods. Sparse noise experiments • MoG performs as good as the best L1 method, better than L2 methods. Mixture noise experiments • MoG performs better than all L2 and L1 competing methods

Why MoG is robust to outliers? • L1 methods perform well in outlier or heavy noise cases since it is a heavy-tail distribution. • Through fitting the noise as two Gaussians, the obtained MoG distribution is also heavy tailed.

Face modeling experiments

Explanation Camera noise Saturation and shadow noise

Background Subtraction

Summary • We propose a LRMF model with a Mixture of Gaussians (MoG) noise • The new method can well handle outliers like L1-norm methods but using a more efficient way. • The extracted noises are with certain physical meanings

Thanks!

Matrix Factorization with Unknown Noise