Unsupervised Learning for Face Recognition

Face Recognition by IndependentComponent Analysis Author: Marian Stewart Bartlett, Javier R. Movellan, Terrence J. Sejnowski Lecturer: Fang Fang

General Information • IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 6, NOVEMBER 2002 • http://ieeexplore.ieee.org/Xplore/dynhome.jsp

Marian Stewart Bartlett • Assistant Research Professor at the Institute for Neural Computation, University of California-San Diego • Education: the B.S. in mathematics and computer science from Middlebury College, in 1988 the Ph.D. in cognitive science and psychology from the University of California-San Diego, La Jolla, in 1998. • Advisor: T. Sejnowski • research interests · Image analysis through unsupervised learning. · Facial identity recognition. · Facial expression analysis. · Independent component analysis for pattern recognition Homepage: http://mplab.ucsd.edu/~marni/index.html Email: marni@salk.edu

Publications • Book: Face Image Analysis by Unsupervised Learning. Foreword by Terrence J. Sejnowski. • Papers : Bartlett, M.S., Littlewort, G.C. Automatic Recognition of Facial Actions in Spontaneous Expressions. Journal of Multimedia 1(6) p. 22-35. (2006). Bartlett, M.S., Littlewort, G.C Fully automatic facial action recognition in spontaneous behavior. Automatic Face and Gesture Recognition. (2006). Bartlett, M.S., Littlewort, G.C Recognizing Facial Expression: Machine Learning and Application to Spontaneous Behavior.CVPR2005. Littlewort, G.C ., Bartlett, M.S. Dynamics of facial expression extracted automatically from video., CVPR2004 Bartlett, M.S., Littlewort, G.C Real time face detection and expression recognition: Development and application to human-computer interaction. CVPR 2003.

Javier R. Movellan • was born in Palencia,Spain. • Research Associate with Carnegie-Mellon University, from 1989 to1993 Assistant Professor with the Department of Cognitive Science, University of California-San Diego (USCD), from 1993 to 2001. Research Associate with the Institute for Neural Computation and head of the Machine Perception Laboratory at UCSD. • Education: the B.S. Universidad Autonoma de Madrid, Spain. the Ph.D. University of California-Berkeley in 1989 He was a Fulbright Scholar at the same University • research interests development of perceptual computer interfaces. analyzing the statistical structure of natural signals in order to help understand how the brain works Email: javier@inc.ucsd.edu

Publications Javier R. Movellan: Local Algorithm to Learn Trajectories with Stochastic Neural Networks. NIPS 1993 Javier R. Movellan: Visual Speech Recognition with Stochastic Networks. NIPS 1994 Javier R. Movellan, Paul Mineiro: Bayesian Robustification for Audio Visual Fusion. NIPS 1997 Javier R. Movellan: A Learning Theorem for Networks at Detailed Stochastic Equilibrium. Neural Computation 1998 Javier R. Movellan, Paul Mineiro: Robust Sensor Fusion: Analysis and Application to Audio Visual Speech Recognition. Machine Learning 1998 Javier R. Movellan, Paul Mineiro : Partially Observable SDE Models for Image Sequence Recognition Tasks. NIPS 2000 Javier R. Movellan, Thomas Wachtler : Factorial Coding of Color in Primary Visual Cortex. NIPS 2002

Terrence J. Sejnowski • joined the faculty of the Department of Biophysics at Johns Hopkins University in 1982 • an Investigator with the Howard Hughes Medical Institute a Professor at The Salk Institute for Biological Studies ,where he directs the Computational Neurobiology Laboratory Professor of Biology at the University of California-San Diego Dr. Sejnowski received the IEEE Neural Networks Pioneer Award in 2002. • Education: the B.S. in physics from the CaseWestern Reserve University the Ph.D. in physics from Princeton University, in 1978. • research interests The long-range goal is to build linking principles from brain to behavior using computational models. Email: terry@salk.edu

Publications Terrence J. Sejnowski, B. Yuhas : Combining Visual and Acoustic Speech Signals with a Neural Network Improves Intelligibility. NIPS 1989 Nicol N. Schraudolph, Terrence J. Sejnowski : Competitive Anti-Hebbian Learning of Invariants. NIPS 1991 Steven J. Nowlan, Terrence J. Sejnowski : Filter Selection Model for Generating Visual Motion Signals. NIPS 1992 Jutta Kretzberg, Terrence J. Sejnowski : Variability of postsynaptic responses depends non-linearly on the number of synaptic inputs. Neurocomputing 2003 Odelia Schwartz, Terrence J. Sejnowski : Assignment of Multiplicative Mixtures in Natural Images . NIPS 2004 Odelia Schwartz, Terrence J. Sejnowski : A Bayesian Framework for Tilt Perception and Confidence. NIPS 2005

提纲 • 摘要 • 介绍 ICA • ICA表示人脸的两种结构 • 实验结果和结论

Abstract • A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images. • Principal componentanalysis (PCA) is a popular example of such methods. The basis images found by PCA depend only on pairwiserelationships between pixels in the image database. In a task such as face recognition, in which important information may be contained in the high-order relationships among pixels, it seems reasonable to expect that better basis images may be found by methods sensitive to these high-order statistics. • Independent component analysis (ICA), a generalization of PCA, is one such method. We used aversion of ICA derived from the principle of optimal information transfer through sigmoidal neurons. • ICA was performed on face images in the FERET database under two different architectures, one which treated the images as random variables and the pixels as outcomes, and a second which treated the pixels as random variables and the images as outcomes. The first architecture found spatially local basis images for the faces. The second architecture produced a factorial face code. • Both ICA representations were superior to representations based on PCA for recognizing faces across days and changes in expression. A classifier that combined the two ICA representations gave the best performance.

摘要 • 目前已存在很多人脸特征提取算法,大多采用无监督统计方法。这些无监督统计方法找出一组人脸基图像,然后用这组基图像的线性组合来表示人脸图像。 • 主成分分析就是这种方法中比较受欢迎的一种,但是它提取的基图像只是基于原人脸图像象素两两之间的二阶统计关系。然而在人脸识别的应用中,识别所需要的重要信息可能包含象素间的高阶统计关系,因此采用对这些高阶信息敏感的特征能获得更好的识别效果。 • 独立分量分析( ICA) 是这种高阶统计方法中的一种,它是主成分分析的推广。ICA的求解过程遵循最优化的过程，这种最优化的信息转换又是通过sigmoid神经元实现的。 • ICA被应用在FERET数据库的人脸图像有两种不同的表示，一种表示方法是把图像作为随机变量把像素作为输出，另一种方法是把像素作为随机变量把图像作为输出。第一种结构叫独立基图像表示。第二种结构叫因子表示。 • 对于天数和表情变化的人脸识别，两种ICA的表示方法都要比PCA的方法好。融合了两种ICA表示的分类器取得了最好的效果。

Introduction • PCA can only separate pairwise linear dependencies between pixels. High-order dependencies will still show in the joint distribution of PCA coefficients, and, thus, will not be properly separated. • In a task such as face recognition, much of the important information may be contained in the high-order relationships among the image pixels. • Independent component analysis (ICA) is one such generalization.

Independent Component Analysis

ICA Ⅰ 基于高阶统计量的独立分量分析 • 目的：把信号分解成若干个互相独立的成分 • 背景：最早由Jutten和Herault在1981年和1991年提出（发表在Signal Processing） Comon在1994年发表的论文，首次阐明了独立分量的概念 • 算法： 1.信息极大化（infomax） Bell 和 Sejnowski 2.互信息极小化 MMI Shun-ichi Amari (minimization of mutual information ) 3.固定点算法（fast ICA） Erkki 和 Hyvarinen 提取信号非高斯极大化

ICA Ⅱ ICA的模型描述为：将观察到的信号（原始信号），看作隐藏变量的线性组合。隐藏变量满足非高斯分布，且相互独立原始信号表示为：是观察到的信号，是独立分量上式表示为：分量相互独立

当且仅当中各分量独立时 ICA Ⅲ 如何判断各分量间相互独立 ? 互信息：选择矩阵，由求，使上式达到极小

ICA Ⅳ 问题：需要对和作估计，估计即繁琐，也不准确解决：通过在输出端引入某种非线性环节，自动引入高阶统计量信息极大化特点：输出y之后逐分量地引入一个非线性函数来代替对高阶统计量的估计目标：在给定合适的后，调节矩阵W，使输出的总熵量极大极大意味着y的各分量间互信息极小

ICAⅤ：infomax algorithm be an -dimensional random vector be an invertible matrix an -D random variable representing the outputs of -neurons. Typically, the logistic function is used: sigmoid function

ICA Ⅵ：infomax algorithm This is achieved by performing gradient ascent on the entropy of the output with respect to the weight matrix . the ratio between the second andfirst partial derivatives of the activation function, Computationof the matrix inverse can be avoided by employing the naturalgradient , which amounts to multiplying the absolute gradientby

附录: 信息极大化的证明Ⅰ 证明：

信息极大化的证明Ⅱ 将上式对W求导：第二项：第三项：因为是以为概率密度函数的均值，所以作随机处理时，可以取消总集均值 ∴

ICAⅦ：模型预处理 • 统一中心：对训练样本中心化，使每个样本成为零均值矢量即： • 球化（白化；sphering）：使中各行互相正交，各行的能量都相等且等于1（消除一阶和二阶的相关性）或是原始信号，和分别是的协方差矩阵的特征值矩阵和特征向量矩阵注：

ICA与PCA的比较Ⅰ • If the sources are Gaussian, the likelihood of the data depends only on first- and second-order statistics . • Second-order statistics capture the amplitude spectrum of images but not their phase spectrum. The high-order statistics capture the phase spectrum. • The phase spectrum, not the power spectrum, contains the structural information in images that drives human perception • For a given sample of natural images, we can scramble their phase spectrum while maintaining their power spectrum. This will dramatically alter the appearance of the images but will not change their second-order statistics

ICA与PCA的比较：例1 Original image Scambledphase Reconstructions with the amplitude of the original face and the phase of the other face

ICA与PCA的比较Ⅱ • It provides a better probabilistic model of the data, which better identifies where the data concentrate in -dimensional space. • It uniquely identifies the mixing matrix . • It finds a not-necessarily orthogonal basis which may reconstruct the data better than PCA in the presence of noise. • It is sensitive to high-order statistics in the data, not just the covariance matrix.

ICA与PCA的比较：例2 Top：3-D data distribution and corresponding PC and IC axes bottom left： Distribution of the first PCA coordinates of the data. bottom right： Distribution of the first ICA coordinates of the data. If only two components are allowed, ICA chooses a different subspace than PCA.

人脸的ICA表示方法 basis image(独立基图像) images are random variables and pixels are trials. it makes sense to talk about independence of images or functions of images factorial code(因子表示) pixels are random variables and images are trials. it makes sense to talk about independence of pixels or functions of pixels.

人脸的ICA表示方法 • Two architectures for performing ICA on images. • Architecture I for finding statistically independent basis images. Performing source separation on the face images produced IC images in the rows of U • (c) Architecture II for finding a factorial code. Performing source separation on the pixels produced a factorial code in the columns of the output matrix, U.

IMAGE DATA • The data set contained images of 425 individuals. There were up to four frontal views of each individual: • train on a single frontal view of each individual test for recognition under three different conditions • Coordinates for eye and mouth locations were provided with the FERET database. • crop and scale them to 60× 50 pixels. neutral expression and change of expression from session 1; neutral expression and change of expression from session2

ARCHITECTURE I:statistically independent basis images

ARCHITECTURE I:statistically independent basis images 3000 was intractable under our present memory limitation ICA on a set of m linear combinations of those images PCA ICA

PCA ICA The PC representation of the set of zero-mean images in based on is defined as: A minimum squared error approximation of is obtained by The ICA algorithm produced a matrix such that： Therefore： Coefficient:

PCA ICA A representation for test images was obtained by using the PC representation based on the training images to obtain , and then computing It was employed to serve two purposes: • to reduce the number of sources to a tractable number • to provide a convenient method for calculating representations of test images.

Face Recognition Performance the coefficient vectors by the nearest neighbor algorithm, using cosines as the similarity measure.

Subspace Selection The ICA coefficients consistently had greater class discriminability than the PCA coefficients Discriminability of the ICA coefficients (solid lines) and discriminability of the PCA components (dotted lines) for the three test cases. Components were sorted by the magnitude of r

Subspace Selection the ICA-defined subspace encoded more information about facial identity than PCA-defined subspace. Improvement in face recognition performance for the ICA and PCA representations using subsets of components selected by the class discriminability r. The improvement is indicated by the gray segments at the top of the bars.

ARCHITECTURE Ⅱ:A Factorial Face Code

ARCHITECTURE Ⅱ: the data matrix x so that rows represent different pixels and columns represent different images

ARCHITECTURE Ⅱ: The representational code for test images is obtained: In order to reduce the dimensionality of the input,,ICA was performed on the first 200 PCA coefficients of the face images. representation for the training images: representation for test images:

this approach tends to generate basis images that look more face-like than the basis images generated by PCA

Face Recognition Performance There was no significant difference in the erformances of the two ICA representations

Face Recognition Performance Selection of subsets of components for the representation by class discriminability had little effect on the recognition performance using the ICA-factorial representation

Examination of the ICA representations Mutual information

DiscussionⅠ • In this paper, we explored one such generalization: Bell and Sejnowski’s ICA algorithm. • We explored two different architectures for developing image representations of faces using ICA. • The purpose of the comparison in this paper was to examine ICA and PCA-based representations under identical conditions. • Both ICA representations outperformed the “eigenface” representation ,for recognizing images of faces sampled on a different day from the training images. • there was no significant difference between PCA and ICA using Euclidean distance as the similarity measure. ( Moghaddam )

DiscussionⅡ • It is an open question as to whether these techniques would enhance performance with PCA and ICA equally. • It is possible that the factorial code representation may prove advantageous with more powerful recognition engines than nearest neighbor on cosines, such as a Bayesian classifier • The research presented here found that face representations in which high-order dependencies are separated into individual coefficients gave superior recognition performance to representations which only separate second-order redundancies.

Thanks All!

Unsupervised Learning for Face Recognition

Unsupervised Learning for Face Recognition

Presentation Transcript

Independent Component Analysis

EE645: Independent Component Analysis

Independent Component Analysis: Algorithms and Applications

Independent Component Analysis (ICA)

Face Recognition using Tensor Analysis

REAL-TIME INDEPENDENT COMPONENT ANALYSIS IMPLEMENTATION AND APPLICATIONS By

Face recognition: component-based versus global approaches

Principal Component Analysis and Independent Component Analysis in Neural Networks

Independent Component Analysis

Subband-based Independent Component Analysis

Independent Component Analysis

Independent Component Analysis

Automatic Face Recognition under Component-Based Manifolds

Independent Component Analysis for Beam Measurement

Applications of Independent Component Analysis

Independent Component Analysis

Facial Feature Extraction by Kernel Independent Component Analysis

Face recognition and detection using Principal Component Analysis PCA

Independent Component Analysis For Track Classification

INDEPENDENT COMPONENT ANALYSIS OF TEXTURES

Independent Component Analysis (ICA)