Recognizing Objects and Actions in Images Jitendra Malik U.C. Berkeley

Recognizing Objects and Actions in ImagesJitendra MalikU.C. Berkeley

Many kinds of images… • Ordinary optical images/video • Ubiquitous, cheap • X Ray tomography • Volumetric data • Range sensors • 2.5 D data • ……

From images/video to objects Labeled sets: tiger, grass etc

Possible for both instances or object classes (Mona Lisa vs. faces or Beetle vs. cars) • Tolerant to changes in pose and illumination, and occlusion Recognition

Examples of Actions • Movement and posture change • run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), … • Object manipulation • pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit, press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike, insert, extract, juggle, play musical instrument (various)… • Conversational gesture • point, … • Sign Language

Outline • Finding/recognizing faces • Recognizing objects • Recognizing actions

Face Detection Carnegie Mellon University Results on various images submitted to the CMU on-line face detector

Accuracy of Face Detection Carnegie Mellon University • MIT-CMU test set • 94% detection rate with false detection every 2 images • Ordinary consumer photographs (according to Gretag Imaging) • 88% detection rate with false detection every image H. Schneiderman and T. Kanade. “Object Detection Using the Statistics of Parts.” To appear in Int. Jour. of Comp. Vision, 2002.

(courtesy, J. Phillips) • Test of commercial-off-the-shelf (COTS) facial recognition products • Test developed by NIST based on FERET - 13,872 x 13,872 image evaluation matrix resulting in over 192 million matches • Formal test was conducted May to June 2000 • Results released in February 2001 • Sponsored by DoD Counterdrug Technology Development Program Office, National Institute of Justice, NAVSEA Crane Division and DARPA http://www.dodcounterdrug.com/facialrecognition/ DoD Counterdrug

Pose Gallery 200 people Probe set 400 images Probability of ID 10° 20° 25° 45°

Illumination Gallery 227 Probability of ID Indoor/Ambient 236 Outdoor 190

FRVT 2000 Results Critical Parameters • Rotations • Distance/resolution • Duplicates • Illumination

Biological Shape • D’Arcy Thompson: On Growth and Form, 1917 • studied transformations between shapes of organisms

Matching FrameworkBelongie, Malik & Puzicha, PAMI 2002 ... model target • Find correspondences between points on shape • Estimate transformation & measure similarity

Comparing Pointsets

Comparing Shape Contexts Compute matching costs using Chi Squared distance: Recover correspondences by solving linear assignment problem with costs Cij [Jonker & Volgenant 1987]

Matching Framework ... model target • Find correspondences between points on shape • Estimate transformation & measure similarity

Thin Plate Spline Model • 2D counterpart to cubic spline: • Minimizes bending energy: • Solve by inverting linear system • Can be regularized when data is inexact Duchon (1977), Meinguet (1979), Wahba (1991)

MatchingExample model target

Object Recognition Experiments • Handwritten digits • COIL 3D objects (Nayar-Murase) • Human body configurations • Trademarks

Terms in Similarity Score • Shape Context difference • Local Image appearance difference • orientation • gray-level correlation in Gaussian window • … (many more possible) • Bending energy

Handwritten Digit Recognition • MNIST 600 000 (distortions): • LeNet 5: 0.8% • SVM: 0.8% • Boosted LeNet 4: 0.7% • MNIST 60 000: • linear: 12.0% • 40 PCA+ quad: 3.3% • 1000 RBF +linear: 3.6% • K-NN: 5% • K-NN (deskewed): 2.4% • K-NN (tangent dist.): 1.1% • SVM: 1.1% • LeNet 5: 0.95% • MNIST 20 000: • K-NN, Shape Context matching: 0.63%

COIL Object Database

Prototypes Selected for 2 Categories Details in Belongie, Malik & Puzicha (NIPS2000)

Error vs. Number of Views

Human body configurations

Deformable Matching(Mori & Malik, ECCV 2002) • Kinematic chain-based deformation model • Use iterations of correspondence and deformation • Keypoints on exemplars are deformed to locations on query image

Results

Tracking by Repeated Finding

Examples of Actions • Movement and posture change • run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), … • Object manipulation • pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit, press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike, insert, extract, juggle, play musical instrument (various)… • Conversational gesture • point, … • Sign Language

Activities and Situation Assessment • Example: Withdrawing money from an ATM • Activities constructed by composing actions. Partial order plans may be a good model. • Activities may involve multiple agents • Detecting unusual situations or activity patterns is facilitated by the video  activity transform

Segment/Region-of-interest Features (points, curves, wavelet coefficients..) Correspondence and deform into alignment Recover parameters of generative model Discriminative classifier Segment/volume-of-interest Features (points, curves, wavelets, motion vectors..) Correspondence and deform into alignment Recover parameters of generative model Discriminative classifier Objects in space Actions in spacetime

Key cues for action recognition • “Morpho-kinesics” of action (shape and movement of the body) • Identity of the object/s • Activity context

Image/Video  Stick figure  Action • Stick figures can be specified in a variety of ways or at various resolutions (deg of freedom) • 2D joint positions • 3D joint positions • Joint angles • Complete representation • Evidence that it is effectively computable

Mathematical Challenges • Modeling shape variation • Nearest neighbor search in high dimensions • Combining statistical optimality with computational efficiency • Reconstruction algorithms for novel sensing modalities

Recognizing Objects and Actions in Images Jitendra Malik U.C. Berkeley

Recognizing Objects and Actions in Images Jitendra Malik U.C. Berkeley

Presentation Transcript

Matching Shapes Serge Belongie * , Jitendra Malik and Jan Puzicha U.C. Berkeley * Present address: U.C. San Diego

Shape Matching and Object Recognition using Shape Contexts Jitendra Malik U.C. Berkeley (joint work with S. Belongie, J.

U.C. Berkeley and LBNL

Recognizing Human Actions by Attributes

2011 U.C. Berkeley Business Plan Competition Kick-off

Joel Fajans U.C. Berkeley and the ALPHA Collaboration

Invited Speakers Robert Coleman(U.C. Berkeley, USA) Samit Dasgupta (U.C. Santa Cruz, USA)

J. Bradford DeLong U.C. Berkeley and Kauffman Foundation May 2013

Dung-Hai Lee U.C. Berkeley

Engineering and Project Management at U.C. Berkeley

Discovering Objects and their Location in Images

UPC at LBNL/U.C. Berkeley Overview

Discovering Objects and their Location in Images

Document Objects Forms, Images and Links

Approaches to Representing and Recognizing Objects

David P. Anderson U.C. Berkeley Kevin Reed IBM

Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

U.C. Berkeley and LBNL

±30 Years of IBM / U.C. Berkeley Synergy in Research

J. Bradford DeLong U.C. Berkeley and Kauffman Foundation May 2013

Perceiving and Recognizing Objects

Perceiving and Recognizing Objects