250 likes | 425 Views
ECSE 6610 Pattern Recognition. Professor Qiang Ji Spring, 2011. Pattern Recognition Overview. Training. Classification/ Regression. Feature extraction. Output Values. Features. Features. Training. Training. Output Values. Unknown Classifier/ Regressor. Testing Raw Data.
E N D
ECSE 6610 Pattern Recognition Professor QiangJiSpring, 2011
Pattern Recognition Overview Training Classification/ Regression Feature extraction Output Values Features Features Training Training Output Values Unknown Classifier/ Regressor Testing Raw Data Training Raw Data Testing Feature extraction Learned Classifier/ Regressor Feature extraction: extract the most discriminative features to concisely represent the original data, typically involving dimensionality reduction Training/Learning: learn a mapping function that maps input to output Classification/regression: map the input to a discrete output value for classification and to continuous output value for regression.
Pattern Recognition Overview (cont’d) Supervised learning Both input (feature) and output (class labels) are provided Unsupervised learning-only input is given Clustering Dimensionality reduction Density estimation Semi-supervised learning-some input has output labels and others do not have
Examples of Pattern Recognition Applications • Computer/Machine Vision object recognition, activity recognition, image segmentation, inspection • Medical Imaging Cell classification • Optical Character Recognition Machine or hand written character/digit recognition • Brain Computer Interface Classify human brain states from EEG signals • Speech Recognition Speaker recognition, speech understanding, language translation • Robotics Obstacle detection, scene understanding, navigation
Probability Calculus U is the sample space X is a subset of the outcome or an event P(X ˅ Y)=P(X)+P(Y) - P(X ˄Y) ,i.e, X and Y are mutually exclusive
Probability Calculus (cont’d) • Conditional independence • The Chain Rule Given three events A, B, C
The Rules of Probability • Sum Rule • Product Rule
Bayes’ Theorem posterior likelihood × prior
Bayes Rule A2 A3 A4 A1 p(A, B) p(B | A)p(A) = = p(A | B) p(B) p(B) • Based on definition of conditional probability • p(Ai|E) is posterior probability given evidence E • p(Ai) is the prior probability • P(E|Ai) is the likelihood of the evidence given Ai • p(E) is the probability of the evidence E A6 A5 p(E | A )p(A ) p(E | A )p(A ) = = i i i i p(A | E) å i p(E) p(E | A )p(A ) i i i
Bayesian Rule (cont’d) Assume E1 and E2 are independent given H, the above equation may be written as where is the prior and is the likelihood of H given E2
A Simple Example Consider two related variables: 1. Drug (D) with values y or n 2. Test (T) with values +ve or –ve And suppose we have the following probabilities: P(D = y) = 0.001 P(T = +ve | D = y) = 0.8 P(T = +ve | D = n) = 0.01 These probabilities are sufficient to define a joint probability distribution. Suppose an athlete tests positive. What is the probability that he has taken the drug?
Expectation (or Mean) • For discrete RV X • For continuous RV X • Conditional Expectation
Expectations Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)
Variance • The variance of a RV X • Standard deviation • Covariance of RVs X and Y, • Chebyshev inequality
Independence • If X and Y are independent, then
Probability Densities p(x) is the density function, while P(x) is the cumulative distribution. P(x) is a non-decreasing function.
The Multivariate Gaussian m=mean vector S=covariance matrix
Minimum Misclassification Rate Two types of mistakes: False positive (type 1) False negative (type 2) The above is called Bayes error. Minimum Bayes error is achieved at x0
Generative vs Discriminative Generative approach: Model Use Bayes’ theorem Discriminative approach: Model directly