350 likes | 500 Views
Pattern Recognition Lecture 1 - Overview. Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007. Goal. Learn a function that maps features x to predictions C , given a dataset D = { C k , x k } Elements of the problem
E N D
Pattern RecognitionLecture 1 - Overview Jim Rehg School of Interactive ComputingGeorgia Institute of TechnologyAtlanta, Georgia USA June 12, 2007
Goal • Learn a function that maps features x to predictions C, given a dataset D = {Ck , xk} • Elements of the problem • Knowledge about data-generating process and task • Design of feature space for x based on data • Decision rule f : xC’ • Loss function L(C’,C) for measuring quality of prediction • Learning algorithm for computing f from D • Empirical measurement of classifier performance • Visualization of classifier performance and data properties • Computational cost of classification (and learning)
Example: Skin Detection in Web Images • Images containing people are interesting • Most images with people in them contain visible skin • Skin can be detected in images based on its color. • Goal: Automatic detection of “adult” images • DEC Cambridge Research Lab, 1998
Physics of Skin Color • Skin color is due to melanin and hemoglobin. • Hue (normalized color) of skin is largely invariant across the human population. • Saturation of skin color varies with concentration of melanin and hemoglobin (e.g. lips). • Detailed color models exist for melanoma identification using calibrated illumination. • But observed skin color will be effected by lighting, image acquisition device, etc.
Skin Classification Via Statistical Inference • Joint work with Michael Jones at DEC CRL • M. Jones and J. M. Rehg, “Statistical Color Models with Application to Skin Detection”, IJCV, 2001. • Model color distribution in skin and nonskin cases • Estimate p(RGB | skin) and p(RBG | nonskin) • Decision rule: f : RGB {“skin”, “nonskin”} • Pixel is “skin” when p(skin | RGB) > p(nonskin | RGB) • Data set D • 12,000 example photos sampled from a 2 million image set obtained from an AltaVista web crawl • 1 billion hand-labeled pixels in training set
Some Example Photos Example skin images Example non-skin images
Manually Labeling Skin and Nonskin Labeled skin pixels are segmented by hand: Labeled nonskin pixels are easily obtained from images without people
Skin Color Modeling Using Histograms • Feature space design • Standard RGB color space - easily available, efficient • Histogram probability model P(RBG | skin) P(RBG | nonskin)
Skin Color Histogram Segmented skin regions produce a histogram in RGB space showing the distribution of skin colors. Three views of the same skin histogram are shown:
Non-Skin Color Histogram Three views of the same non-skin histogram showing the distribution of non-skin colors:
f =1 > < f = 0 Decision Rule Class labels: “skin” C=1 “nonskin” C=0 Equivalently:
f =1 f =1 > > < < f = 0 f = 0 The ratio of class priors is usually treated as a parameter (threshold) which is adjusted to trade-off between types of errors Likelihood Ratio Test
f =1 > < f = 0 Skin Classifier Architecture P(RBG | skin) Output “skin” Input Image P(RBG | nonskin)
Indicator function for boolean B: Measuring Classifier Quality • Given a testing set T = {Cj , xj} that was not used for training, apply the classifier to obtain predictions • Testing set partitioned into four categories
Measuring Classifier Quality A standard convention is to report Fraction of positive examples classified correctly Fraction of negative examples classified incorrectly
f =1 > < f = 0 Trading Off Types of Errors • Consider • Classifier always outputs f = 1 regardless of input • All positive examples correct, all negative examples incorrect • dR = 1 and fR = 1 • Consider • Classifier always outputs f = 0 regardless of input • All positive examples incorrect, all negative examples correct • dR = 0 and fR = 0
Each sample point on ROC curve is obtained by scoring T with a particular a ROC Curve 1 0.75 Detection Rate dR 0.5 Generating ROC curve does not require classifier retraining 0.25 0 0 0.25 0.5 0.75 1 False Positive Rate fR
ROC Curve 1 A fair way to com- pare two classifiers is to show their ROC curves for the same T 0.75 Detection Rate dR 0.5 ROC stands for “Receiver Oper-ating Characteristic” and was originally developed for tuning radar receivers 0.25 0 0 0.25 0.5 0.75 1 False Positive Rate fR
Equal Error Rate Scalar Measures of Classifier Performance 1 0.75 Detection Rate dR 0.5 Area under the ROC curve 0.25 0 0 0.25 0.5 0.75 1 False Positive Rate fR
ROC Curve Summary • ROC curve gives “application independent” measure of classifier performance • Performance reports based on a single point on the ROC curve are generally meaningless • Several possible scalar “summaries” • Area under the ROC curve • Equal error rate • Compute ROC by iterating over the values of a • Compute the detection and false positive rates on the testing set for each value of a and plot the resulting point.
Example Results Skin examples: Nonskin examples:
Skin Detector Performance Extremely good results considering only color of single pixel is being used. Best published results (at the time) One of the largest datasets used in a vision model (nearly 1 billion labeled pixels). Detection Rate dR False Positive Rate fR But why does it work so well ???
Analyzing the color distributions Why does it work so well? 2D color histogram for photos on the web projected onto a slice through the 3D histogram: Surface plot of the 2D histogram:
Contour Plots Full color model (includes skin and non-skin):
Contour Plots Continued Non-skin model: Skin model: Skin color distribution is surprisingly well-separated from the background distribution of color in web images
Adult Image Detection • Observation: Adult images usually contain large areas of skin • Output of skin detector can be used to create feature vector for an image • Adult image classifier trained on feature vectors • Exploring joint image/text analysis Image Skin Features Neural net Classifier Skin Detector Adult? Text Features HTML Classifier
More Examples Classified as not adult Incorrectly classified as adult - closups of faces are a failure mode due to large amounts of skin Classified as not adult Classified as not adult
Adult Image Detection Results Two sets of html pages collected. Crawl A: Adult sites (2365 pages, 11323 images). Crawl B: Non-adult sites (2692 pages, 13973 images). image-based text-based combined “OR” detector detector detector ----------------- ------------- ------------------- % of adult images rated correctly (set A): 85.8% 84.9% 93.9% % of non-adult images rated correctly (set B): 92.5% 98.9% 92.0%
Computational Cost Analysis • General image properties • Average width = 301 pixels • Average height = 269 pixels • Time to read an image = .078 sec • Skin Color Based Adult Image Detector • Time to classify = .043 sec • Implies 23 images/sec throughput
Summary of Skin Detection Example • What are the factors that made skin detection successful? • Problem which seemed hard a priori but turned out to be easy (classes surprisingly separable). • Low dimensionality makes adequate data collection feasible and classifier design a non-issue. • Intrinisic dimensions are clear a priori • Concentration of nonskin model along grey line is completely predictable from the design of perceptual color spaces
Perspectives on Pattern Recognition • Our goal is to uncover the underlying organization for what often seems to be a laundry list of methods: • Linear and Tree Classifiers • Gaussian Mixture Classifiers • Logistic Regression • Neural Networks • Support Vector Machines • Gaussian Process Classifiers • AdaBoost • …
Statistical Perspective • Statistical Inference Approach • Probability model p(C, x | q), where q is vector of parameters estimated from D using statistical inference • Decision rule is derived from p(C, x | q) • Two philosophical schools • Frequentist Statistics • Bayesian Statistics • Learning Theory Approach • Classifiers with distribution-free performance guarantees • Connections to CS theory, computability, etc. • Examples: PAC learning, structured risk minimization, etc.
Decision Theory Perspective • Three ways to obtain the decision rule f (x) • Generative Modeling • Model p(x | C) and p(C) using D • Obtain p(C | x) using Bayes Rule • Obtain the decision rule from the posterior • Advantages • Use p(x) for novelty detection • Sample from p(x) to generate synthetic data and assess model quality • Use p(C | x) to assess confidence in answer (reject region) • Easy to compose modules that output posterior probabilities
Decision Rule • Discriminative modeling • Obtain the posterior p(C | x) directly from D • Derive the decision rule from the posterior • Advantages • The posterior is often much simpler than the likelihood function • Posterior more directly related to the classification rule, may yield fewer prediction errors.