530 likes | 541 Views
Attentive People Finding. James Elder Centre for Vision Research York University Toronto, Canada. Joint work with: Simon Prince Bob Hou. Collaborative Project: “Monitoring Changes to Urban Environments with a Network of Sensors”
E N D
Attentive People Finding James Elder Centre for Vision Research York University Toronto, Canada Joint work with: Simon Prince Bob Hou
Collaborative Project: “Monitoring Changes to Urban Environments with a Network of Sensors” Funding: Canadian Agency called GEOIDE (Geomatics for Informed Decisions) "This ‘network of networks’ brings together the skills, technology and people from different communities of practice, in order to develop and consolidate the Canadian competences in geomatics." Research Context
Monitoring Changes to Urban Environments "This project will study visual detection and interpretation of changes to urban environments using continuous and non-continuous sensing from a multiplicity of diverse sensors using networks of video cameras, augmented with high-resolution satellite imagery. It will also investigate the problem of how such information can be integrated and managed within a computer, leading to the development of a prototype information system for monitoring urban environments." What is our project?
University Principal Investigators: David Clausi, Waterloo Geoffrey Edwards, Laval James Elder, York Frank Ferrie, McGill Jim Little, UBC Main Industry Partners CAE Genetec Aimetis Project Team
April 2005 – March 2009 Timeframe
1. Establishment of urban test facilities involving networks of multi-sensor wireless cameras with associated satellite data and development of intercalibration software (Elder, Ferrie, Little) 2. Development of algorithms for fusing offline satellite data with streaming video from terrestrial sensors for the construction of more complete 3D urban models (Clausi). 3. Development of algorithms for inferring approximate intrinsic images from monocular video (ordinal depth maps, reflectance maps, …). (Elder, Ferrie, Little) 4. Development of algorithms for identifying and modeling typical dynamic events (e.g. pedestrian and automobile traffic, changes in climate, air quality, seasonal changes) and detecting unusual events. (Elder, Ferrie, Little) 5. Development of algorithms for deriving and updating navigational maps based upon derived models. (Edwards) 6. Development of integrated demonstration system. (Ferrie) Objectives
Disaster management (e.g., earthquakes) Traffic monitoring (e.g., automobile, trucking, pedestrian) Security (e.g., people tracking, activity and identity recognition) Urban planning (e.g., 3D dynamic scene visualization) Environmental monitoring (e.g., air quality) Possible Application Areas
FOVEAL IMAGE TILT PAN WIDE-FIELD IMAGE Pre-Attentive and Attentive Sensing (with S. Prince, Y. Hou, M. Sizinitsev, E. Olevskey)
Wide-Field Body Detection Min: 15x2 pixels Max: 98x78 pixels Median: 52x14 pixels
Wide-Field Face Detection Min: 2x2 pixels Max: 34x31 pixels Median: 6x6 pixels
Motion scaling From Johnston & Wright, 1986
Biological Motion From Ikeda, Blake & Watanabe, 2005
1000 ms 59 ms 506 ms Until Response Structural Coherence (with L. Velisavljevic) Psychophysical Method
Image Conditions Scrambled Coherent Colour Monochrome
82 Data Model 78 74 70 66 62 58 Colour Colour BW BW Coherent Incoherent Coherent Incoherent Results % Correct
90 80 Percent Correct 70 60 Unscrambled Scrambled 50 3 8 13 18 Mean Distance from Fixation (º) Spatial Coherence Colour Monochromatic
Pre-Attentive (Peripheral) Vision: Motion discrimination Colour discrimination Biological motion Contour integration Coherent structure Summary
Motion region likelihood ratio raw pixel pixel posterior region response pixel model spatial integrator region model Foreground region likelihood ratio system posterior raw pixel pixel posterior region response system priors pixel model spatial integrator region model X Skin region likelihood ratio pixel posterior region response raw pixel pixel model spatial integrator region model Preattentive System Design
confirmed face location mean body indicator motion kernel spatial prior gaze command prior posterior random sampler gaze control high-resolution face detection non-max suppression likelihood Attentive sensor motion kernel Priors as Attentive Feedback
1 Motion 0.5 Original frame 0 Foreground 1 Skin 0.5 0 Skin 1 0.5 0 Pixel Posteriors Pixel Posteriors
0.86 0.84 0.82 0.8 Area under ROC Curve 0.78 0.76 Motion 0.74 Foreground 0.72 Skin 0.7 -1 0 1 10 10 10 g Exponent, Spatial Integration
Motion Region Log Likelihood Ratio 4 2 Joint Region Log Likelihood Ratio 0 4 -2 2 Foreground Region Log Likelihood Ratio -4 4 0 2 -2 0 -4 -2 -4 Skin Region Log Likelihood Ratio 4 2 0 -2 -4 Spatial Integration
1 0.8 0.6 p(Hit) Foreground 13 x 20 Skin 4 x 5 0.4 Motion 20 x 20 Combined 0.2 Xiong & Jaynes 0 0 0.2 0.4 0.6 0.8 1 p(False Positive) Combining Detectors • System evaluation on distinct test database: • 74% of fixations capture human heads
System evaluation on distinct test database: 74% of fixations capture human heads 83% of people are fixated at least once Performance
3D POSE PROBLEM Capture training and test database Horizontal pose (known) varies over 180 degrees. Pose for each image known precisely. Points on each face identified Image regions extracted Features are weighted sums of pixels in region
An Alternate Approach: 2D to 3D (with VisionSphere Technologies)
Realistic environments and behaviour hard problem. Humans: primitive mechanisms are preserved in periphery, more complex mechanisms are not. Our approach: probabilistic combination of simple, weak cues Ongoing work: attentive feedback Attentive People Finding
Colour Scaling From Rovamo & Iivanainen, 1991
Contour Integration From Hess & Dakin, 1999
Contour Integration From Hess & Dakin, 1999
Interactive Attentive Sensing Needed: Fast Saccadic Programming Algorithms!
0.86 0.84 0.82 0.8 Area under ROC Curve 0.78 0.76 Motion 0.74 Foreground 0.72 Skin 0.7 -1 0 1 10 10 10 g Exponent, Spatial Integration
A supervised method to make a feature set more invariant to a known nuisance parameter Fast No knowledge of faces No knowledge of 3d transformations Slower Uses lot s of domain specific knowledge Better Results SUMMARY EIGEN-LIGHTFIELDS < INVARIANCE << 3D MODEL Gross, Matthews, Baker Prince, Elder Blanz et al.
TO TRAIN: ESTIMATE MEAN AND COV OF MANIFOLD A FUNCTION OF DISTRACTOR VARIABLE ALTERNATELY ESTIMATE: INVARIANT VECTORS Ci TRANSFORMATIONS F1..n TO CALCULATE INVARIANT VECTORS: ESTIMATE NUISANCE VALUE, v TRANSFORM BY APPROPRIATE Fv Algorithm Summary
Problem: Image variation due to nuisance parameters such as pose change is greater than variation due to identity. This is reflected in most “features” PROBLEM STATEMENT Feature Space
X1 C f1,q1 ………………. ………………. ………………. NUISANCE PARAMETERS + CONVENTIONAL FEATURE VECTOR INVARIANT VECTOR f2,q2 X2 GOAL: Decompose Conventional Feature Vector to Invariant Feature + Nuisance Parameter
TEST IMAGES – angle unknown ? PROBE IMAGE – angle unknown TRAINING IMAGES – angle known, several images of each face present TOY DATA SET – IN PLANE ORIENTATION Choice of features: – first few EIGENVECTORS
Increasing q THE FIRST TWO FEATURE DIMENSIONS X2 X1