Professor Qiang Ji Spring, 2011

ECSE 6610 Pattern Recognition Professor QiangJiSpring, 2011

Pattern Recognition Overview Training Classification/ Regression Feature extraction Output Values Features Features Training Training Output Values Unknown Classifier/ Regressor Testing Raw Data Training Raw Data Testing Feature extraction Learned Classifier/ Regressor Feature extraction: extract the most discriminative features to concisely represent the original data, typically involving dimensionality reduction Training/Learning: learn a mapping function that maps input to output Classification/regression: map the input to a discrete output value for classification and to continuous output value for regression.

Pattern Recognition Overview (cont’d) Supervised learning Both input (feature) and output (class labels) are provided Unsupervised learning-only input is given Clustering Dimensionality reduction Density estimation Semi-supervised learning-some input has output labels and others do not have

Examples of Pattern Recognition Applications • Computer/Machine Vision object recognition, activity recognition, image segmentation, inspection • Medical Imaging Cell classification • Optical Character Recognition Machine or hand written character/digit recognition • Brain Computer Interface Classify human brain states from EEG signals • Speech Recognition Speaker recognition, speech understanding, language translation • Robotics Obstacle detection, scene understanding, navigation

Computer Vision Example: Facial Expression Recognition

Machine Vision Example

Example: Handwritten Digit Recognition

Probability Calculus U is the sample space X is a subset of the outcome or an event P(X ˅ Y)=P(X)+P(Y) - P(X ˄Y) ,i.e, X and Y are mutually exclusive

Probability Calculus (cont’d) • Conditional independence • The Chain Rule Given three events A, B, C

The Rules of Probability • Sum Rule • Product Rule

Bayes’ Theorem posterior  likelihood × prior

Bayesian Rule (cont’d) Assume E1 and E2 are independent given H, the above equation may be written as where is the prior and is the likelihood of H given E2

A Simple Example Consider two related variables: 1. Drug (D) with values y or n 2. Test (T) with values +ve or –ve And suppose we have the following probabilities: P(D = y) = 0.001 P(T = +ve | D = y) = 0.8 P(T = +ve | D = n) = 0.01 These probabilities are sufficient to define a joint probability distribution. Suppose an athlete tests positive. What is the probability that he has taken the drug?

Expectation (or Mean) • For discrete RV X • For continuous RV X • Conditional Expectation

Expectations Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)

Variance • The variance of a RV X • Standard deviation • Covariance of RVs X and Y, • Chebyshev inequality

Independence • If X and Y are independent, then

Probability Densities p(x) is the density function, while P(x) is the cumulative distribution. P(x) is a non-decreasing function.

Transformed Densities

The Gaussian Distribution

Gaussian Mean and Variance

The Multivariate Gaussian m=mean vector S=covariance matrix

Minimum Misclassification Rate Two types of mistakes: False positive (type 1) False negative (type 2) The above is called Bayes error. Minimum Bayes error is achieved at x0

Generative vs Discriminative Generative approach: Model Use Bayes’ theorem Discriminative approach: Model directly

Professor Qiang Ji Spring, 2011

Professor Qiang Ji Spring, 2011

Presentation Transcript

Spring 2011

Spring 2011

Spring 2011

Cai Guo-Qiang

Spring 2011

Spring 2011

Cai Guo-Qiang

Spring 2011

Spring 2011

Spring 2011

Spring 2011

Spring 2011

Spring 2011

Spring 2011

Spring 2011

Spring 2011

Spring 2011

Spring 2011

Spring 2011

Spring 2011

Cai guo Qiang

Cai Guo Qiang