630 likes | 789 Views
Describing People: A Poselet-Based Approach to Attribute Classification. Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe Systems Inc. Goal: Extract attributes from images of people. Who has long hair?. Who has short pants?.
E N D
Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev1,2 Subhransu Maji1 Jitendra Malik1 • 1EECS U.C. Berkeley 2Adobe Systems Inc.
Prior work on Poselets • Introduced by [Bourdev and Malik, ICCV09] • Detection with poselets [Bourdev et al, ECCV10] • Applications • Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] • Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] • Human parsing [Wang et al, CVPR11] • Semantic contours [Hariharan et al, ICCV11] • Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Poselets • Introduced by [Bourdev and Malik, ICCV09] • Detection with poselets [Bourdev et al, ECCV10] • Applications • Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] • Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] • Human parsing [Wang et al, CVPR11] • Semantic contours [Hariharan et al, ICCV11] • Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Poselets • Introduced by [Bourdev and Malik, ICCV09] • Detection with poselets [Bourdev et al, ECCV10] • Applications • Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] • Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] • Human parsing [Wang et al, CVPR11] • Semantic contours [Hariharan et al, ICCV11] • Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Poselets • Introduced by [Bourdev and Malik, ICCV09] • Detection with poselets [Bourdev et al, ECCV10] • Applications • Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] • Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] • Human parsing [Wang et al, CVPR11] • Semantic contours [Hariharan et al, ICCV11] • Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Poselets • Introduced by [Bourdev and Malik, ICCV09] • Detection with poselets [Bourdev et al, ECCV10] • Applications • Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] • Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] • Human parsing [Wang et al, CVPR11] • Semantic contours [Hariharan et al, ICCV11] • Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11][Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10][Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Poselets [Bourdev & Malik ICCV09]
Poselets Examples may differ visually but have common semantics
Finding correspondences at training time Given part of a human pose How do we find a similar pose configuration in the training set?
Finding correspondences at training time Left Shoulder Left Hip We use keypoints to annotate the joints, eyes, nose, etc. of people
Finding correspondences at training time Residual Error
Training poselet classifiers Residual Error: Given a seed patch Find the closest patch for every other person Sort them by residual error Threshold them 0.15 0.20 0.10 0.35 0.15 0.85
Training poselet classifiers Given a seed patch Find the closest patch for every other person Sort them by residual error Threshold them Use them as positive training examples to train a linear SVM with HOG features
Goal: Extract attributes of this person • Target person bounds • Bounds of other nearby people Input:
Step 1: Detect poselet activations • [Bourdev et al, ECCV10]
Step 2: Cluster the activations • [Bourdev et al, ECCV10]
Step 3: Predict person bounds • [Bourdev et al, ECCV10]
Step 4: Identify the correct cluster • Max-flow in bipartite graph
Start with its poselet activations Poselet Activations
Features • Pyramid HOG • LAB histogram • Skin features • Hands-skin • Legs-skin Poselet patch Skin mask Arms mask B .* C Features Features Poselet Activations
Attribute Classification Overview Poselet-level Attribute Classifiers Features Poselet Activations
Attribute Classification Overview Person-level Attribute Classifiers Poselet-level Attribute Classifiers Features Poselet Activations
Attribute Classification Overview Context-level Attribute Classifiers Person-level Attribute Classifiers Poselet-level Attribute Classifiers Features Poselet Activations
Our dataset • Source: VOC 2010 trainval for Person + H3D • ~8000 annotations (4000 train + 4000 test) • 9 binary attributes specified by 5 independent annotators via AMT • Ground truth label: If 4 of the 5 agree • Dataset will be made publicly available
Visual search on our test set “Wears hat” “Female”
“Has long hair” “Wears glasses”
“Wears shorts” “Has long sleeves”