WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES

WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona

Overview • Word-prediction using translation model for object recognition • Feature evaluation • Segmentation evaluation • Modifications to Normalized Cuts segmentation algorithm • Evaluation of color constancy algorithms • Effects of illumination color change on object recognition • Strategies to deal with illumination color change

Motivation • Low-level computer vision algorithms • Segmentation, edge detection, feature extraction, etc. • Building blocks of computer vision systems • Is there a generic task to evaluate these algorithms quantitatively? • Word-prediction using translation model for object recognition • Sufficiently general • Quantitative evaluation is possible

Translation model for object recognition Translate from visual to semantic description

Approach Corel database Model joint probability distribution of visual representations and associated words using a large, annotated image collection.

Image pre-processing Segmentation* visual features [f1 f2 f3 …. fN] sun sky waves sea Joint distribution * Thanks to N-cuts team [Shi, Tal, Malik] for their segmentation algorithm

joint visual/textual concepts * Nodel blob Frequency table Gaussian over features word Learn P(w|l), P(b|l), and P(l) from data using EM *Barnard et al JMLR 2003

Annotating images b1 P(w|b1) + Segment image Compute P(w|b) for each region Sum over regions b2 P(w|b2) . . . P(w|image)

Actual Keywords Measuring performance • Large region-labeled databases are not available • Large annotated databases are available CAT TIGER GRASS FOREST Predicted Words CAT HORSEGRASSWATER • Record percent correct • Use annotation performance as a proxy for recognition

Experimental protocol Corel database Each CD contains 100 images on one specific topic like “aircraft” Average results over 10 different samplings sampling scheme 160 CD’s 80 CD’s 80 CD’s Novel 75% Training 25% Test

Semantic evaluation of vision processes • Feature sets Combinations of visual features • Segmentation methods • Mean-Shift [Comaniciu, Meer] • Normalized Cuts [Shi, Tal, Malik] • Color constancy algorithms • Train with illumination change • Color constancy processing – Gray-world, Scale-by-max

Feature evaluation Features Size Context (Average surrounding color) Location • Color • (RGB, L*a*b, rgS) • Average color • Standard deviation • Texture • Responses to a bank of filters • Even and Odd symmetric • Rotationally symmetric (DOG) • Shape • Second moment • Compactness • Convexity • Outer boundary descriptor

Feature evaluation Annotation Performance (bigger is better) Base = Size + Location + Second moment + Compactness

Segmentation evaluation Mean Shift (Comaniciu, Meer) Normalized Cuts (N-Cuts) (Shi, Tal, Malik)

Segmentation evaluation Annotation Performance (bigger is better) # regions • Performance depends on number of regions used for annotation • Mean Shift is better than N-Cuts for # regions < 6

Normalized Cuts • Graph partitioning technique • Bi-partitions an edge-weighted graph in an optimal sense • Normalized cut (Ncut) is the optimizing criterion Nodes i j wij A B Edge weight => Similarity between i and j Minimize Ncut(A,B) • Image segmentation • Each pixel is a node • Edge weight is similarity between pixels • Similarity based on color, texture and contour cues

Normalized Cuts Original algorithm Preseg Seg Final Initial seg seg pixel region region pixel • Produces splits in homogeneous regions, e.g., “sky” • Local connectivity between pixels

Modifications to Normalized Cuts Meta-segmentation Preseg Iteration 1 Iteration n region region region region l l k k Original Modified

Modifications to Normalized Cuts Original Modified Original Modified

Original vs. Modified Annotation Performance (bigger is better) # regions • For # regions < 6, modified out-performs original • For # regions > 6, original is better

Incorporating high-level information into segmentation algorithms Low-level segmenters split up objects (eg. Black and white halves of a penguin) Using word-prediction gives a way to incorporate high-level semantic information into segmentation algorithms Propose a merge between regions that have similar posterior distributions over words

Illumination change Makes recognition difficult Illumination color change * Illuminant 1 Illuminant 2 • Strategies to deal with illumination change: • Train for illumination change • Color constancy pre-processing and normalization * http://www.cs.sfu.ca/~colour/data

Train for illumination change Training Variation of color under expected illumination changes [Matas et al 1994, Matas 1996, Matas et al 2000]

Color constancy pre-processing Unknown illuminant Canonical (reference) illuminant Recognition system Test Input Algorithm (Map image as if it were taken under reference illuminant). Canonical (reference) illuminant Training database [Funt et al 1998]

Color normalization Unknown illuminant Canonical (reference) illuminant Recognition system Test Input Algorithm (Map image as if it were taken under reference illuminant). Unknown illuminant Canonical (reference) illuminant Training database Normalized training database Algorithm [Funt and Finlayson 1995, Finlayson et al 1998]

Simulating illumination change 0 1 2 3 5 4 8 6 7 10 11 illuminants (0 is canonical) 9

Train with illumination variation Annotation Performance (bigger is better) Experiment A Training:No illumination change Testing:No illumination change Experiment B Training:No illumination change Testing:Illumination change Experiment C Training:Illumination change Testing:Illumination change

Color constancy pre-processing Gray-world Training Test Canonical Unknown Algorithm Canonical Mean color = constant

Color constancy pre-processing Scale-by-max Training Test Canonical Unknown Algorithm Canonical Max color = constant

Color constancy pre-processing Annotation Performance (bigger is better) Experiment A Training:No illumination change Testing:No illumination change Experiment B Training:No illumination change Testing:Illumination change Others Training:No illumination change Testing:Illumination change + Color constancy algorithm

Color normalization Gray-world Scale-by-max Training Test Training Test Unknown Unknown Algorithm Algorithm Canonical Canonical Mean color = constant Max color = constant

Color normalization Annotation Performance (bigger is better) Experiment A Training:No illumination change Testing:No illumination change Experiment B Training:No illumination change Testing:Illumination change Others Training:No illumination change + Color constancy algorithm Testing:Illumination change + Color constancy algorithm

Conclusions • Translation (visual to semantic) model for object recognition • Identify and evaluate low-level vision processes for recognition • Feature evaluation • Color and texture are the most important in that order • Shape needs better segmentation methods • Segmentation evaluation • Performance depends on # regions for annotation • Mean Shift and modified NCuts do better than original NCuts for # regions < 6 • Color constancy evaluation • Training with illumination helps • Color constancy processing helps (scale-by-max better than gray-world)

Thank you!

WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES

WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES

Presentation Transcript

Word-formation processes

Vision: Cortical Processes

LOW VISION

Word Prediction

Low Vision

Learning low-level vision

Logic as a tool

Using Word 2007 as a Teaching Tool

as a Professional Tool

A word about project vision

Example Tool to Evaluate Teaming Practices

Light as a Tool

Applications of belief propagation in low-level vision

Key Word Prediction

WORD FORMATION PROCESSES

Understanding Low Vision: A Functional Vision Perspective

Word prediction

Low level Computer Vision

Checkpoints As A Tool

Low Stakes Quizzes as a Learning Tool

WORD FORMATION PROCESSES

Diffractive Processes as a Tool to Study New Physics at the LHC