190 likes | 272 Views
SVMs for (x) Recognition. (From Moghaddam / Yang’s “Gender Classification with SVMs”) Brian Whitman. “Commodity Intelligence”. ‘Wow factor’ important Collaborative filtering ‘Simple’ tasks sometimes the most useful An SVM embedded evaluator… Cameras with ‘common sense’.
E N D
SVMs for (x) Recognition (From Moghaddam / Yang’s “Gender Classification with SVMs”) Brian Whitman
“Commodity Intelligence” • ‘Wow factor’ important • Collaborative filtering • ‘Simple’ tasks sometimes the most useful • An SVM embedded evaluator… • Cameras with ‘common sense’
Why SVM for feature detection? • Quick evaluation model • Machines (SVs) are easily stored and small
Experiment: Gender ID • Using MITFaces dataset • ~7500 faces with varying genders, races, ages, expressions, ‘extras’ • All aligned 160x160 with left eye at 80,80 • Face content is usually only 80x40
Representation? • Simple pixel values • Why?
Sample size • Maintain ‘ground rule’ of ML • Dimensions < Examples*2 • At 3200 dims (80x40), this is hard • Training parameters (maximum lagrangians, kernel width) help • We use 80x40 and 40x20 in our examples
Training stage • Choose 3200 random adult faces for training and 3200 random faces for testing • Extract 80x40 ‘face window’ from each face and treat the 3200 doubles (0..1) as a training example • Train SVM on pixel values of the train set (dual p4 xeon linux 2ghz -- 30 minutes)
Testing Stage • Take the other 3200 face vectors and present them to the learned SVM • If class > 0, male, < 0, female. • Confidence: some linear combination of # of support vectors and magnitude of result • Had no problem doing this at 10hz on a PIII800 with tons running
In-class face gender results • 80x40; C=100, aux=100 • 93% of faces classified correctly • 95% male • 90% female • 40x20; C=100, aux=10 • 97% • 98% male • 95% female
Next step: Realtime • Media Lab is where webcams go to die • Webcam at 160x120, ‘face region’ to 80x40, downsampled to 40x20. • Webcam gets frames at 10hz, we greyscale it and present it to the previously trained SVM • Results… mixed
Realtime examples • (If the demo crashes)
‘Creepybot’ • With better control over alignment • Monitors Windows clipboard • Same architecture as the Creepycam
Creepybot Examples • (If the demo crashes)
Other parameters • MITFaces has a great data label set • Train an SVM for appearance of each descriptor: • Race • Age • Gender • Expression • Moustache
Per-class results (40x20, etc…) • “Adult or not” • Overall: 94% • (Not adult: 403/516) (78%) • (Adult): 2605/2684) (97%)
Per-class results… • “Smiling or not” • Overall: 88% • (Not smiling: 1354/1520) (89%) • (Smiling: 1450 / 1672) (87%)
Per-class results • “Serious or not” • Overall: 88% • (Not serious: 1517/1712) (89%) • (Serious: 1311/1484) (88%)
Could we do better? • Representation is lacking • But results are surprisingly good • For realtime, need auto-alignment / rescaling, or a better representation • Could this lead to an invasion of cheap intelligent cameras, each with tacky switches for feature detection and marketing?