120 likes | 298 Views
Unsupervised Joint Alignment of Complex Images Gary B Huang, Vidit Jain, Erik Learned-Miller. Joint Face Alignment. The Recognition Pipeline Most systems ignore the middle stage, relying on the initial detector to do a rough alignment
E N D
Unsupervised Joint Alignment of Complex ImagesGary B Huang, Vidit Jain, Erik Learned-Miller
Joint Face Alignment • The Recognition Pipeline • Most systems ignore the middle stage, relying on the initial detector to do a rough alignment • Alignment reduces variability and allows for conditioning on spatial position and analysis of structure • Two major drawbacks to current alignment methods • Designed for a single class • Require manually labeling of either specific features or pose • More involved than simple discrete labels for detection and recognition • AAM - ~80 landmarks for >100 training images • Unsupervised method with congealing • No manually selected landmarks or hand selected parts • No image explicitly labeled as canonical pose • End result entirely determined by data
Congealing update distribution field from transformed images increase likelihood of image with respect to distribution field • Intuition • Intra-class images have similar structure and shape • Thus, low variability of pixel values at specific location • Distribution Field • Distribution over alphabet ({0,1} for binary images) at each pixel • Set of images defines an empirical distribution field • Congealing
Congealing • How to align a new image after congealing? • Insert into training set, re-run algorithm • More efficient to save sequence of distribution fields from congealing • High entropy to low entropy sequence “Image Funnel” • Funneling: increase likelihood of new image at each iteration according to corresponding distribution field Image Funnel New Image Aligned Image
Congealing Complex Images • Congealing has proven to work well on certain object classes • Traditionally applied directly to pixel values • Applied successfully to binary handwritten digits and MRI volumes • Our goal: Extend congealing to deal with noise in real world images • Complex and variable lighting effects • Occlusions • Highly varied foreground objects (hair, hats, glasses…) • Highly varied backgrounds
Congealing Complex Images • Extending Congealing to Complex Images • Traditionally congealing is done on pixel intensities • High variation due to lighting and variable foreground high entropy even when correctly aligned • Congealing on edge values • No “basin of attraction”, plateaus in optimization landscape • Integrate over window SIFT descriptor at each pixel • Each descriptor is 32 dimensional vector, too large to estimate entropy
Congealing Complex Images • Extending Congealing to Face Images (cont) • Cluster SIFT descriptors using kmeans • Congealing on hard assignments forces pixels to take relatively small number of values • Similar local minima problems as with edge values • Initial experiments with hard assignments led to congealing terminating early with no significant changes from initial alignment • Use soft assignment of pixels to clusters • Each pixel is multinomial distribution, with probabilities equal to probability of belonging to each cluster • Does not change nature of distribution field • Distribution field is still a set of distributions, one at each pixel, over the possible clusters • Analogy with grayscale using binary alphabet • Gray pixels are treated as mixtures of underlying black and white “subpixels”
Congealing Complex Images Window around pixel SIFT vector and clusters Posterior distribution
Results (faces) • Congealed with 300 images from “Faces in the Wild” • Realistic data set of news photos with different people, complex backgrounds, variable illumination and foreground appearance
Results (cars) • Congealed with 125 rear car images (variable background/lighting) • Achieved with no labeling and no changes to code
Results on Recognition • Tested effect on recognition • Used trained hyper-feature based recognizer (Jain et al) • Tested using outputs of Viola-Jones, Zhou (supervised), and funneling • Congealing improves recognition with no added supervision
Future Work • Two-tier alignment process • Score alignment results based on likelihood under final distribution field, align low scoring images in separate stage