190 likes | 358 Views
Pattern Recognition. Ku-Yaw Chang canseco@mail.dyu.edu.tw Assistant Professor, Department of Computer Science and Information Engineering Da-Yeh University. Outline. Introduction Features and Classes Supervised v.s. Unsupervised Statistical v.s. Structural (Syntactic)
E N D
Pattern Recognition Ku-Yaw Chang canseco@mail.dyu.edu.tw Assistant Professor, Department of Computer Science and Information Engineering Da-Yeh University
Outline • Introduction • Features and Classes • Supervised v.s. Unsupervised • Statistical v.s. Structural (Syntactic) • Statistical Decision Theory Pattern Recognition
Supervised v.s. Unsupervised • Supervised learning • Using a training set of patterns of known class to classify additional similar samples • Unsupervised learning • Dividing samples into groups or clusters based on measures of similarity without any prior knowledge of class membership Pattern Recognition
Supervised v.s. Unsupervised Dividing the class into two groups: • Supervised learning • Male features • Female features • Unsupervised learning • Male v.s. Female • Tall v.s. Short • With v.s. Without glasses • … Pattern Recognition
Statistical v.s. Structural • Statistical PR • To obtain features by manipulating the measurements as purely numerical (or Boolean) variables • Structural (Syntactic) PR • To design features in some intuitive way corresponding to human perception of the objects Pattern Recognition
Statistical v.s. Structural • Optical Character Recognition (OCR) • Statistical PR • Structural PR Pattern Recognition
Statistical Decision Theory • An automated classification system • Classified data sets • Selected features Pattern Recognition
Statistical Decision Theory • Hypothetical Basketball Association (HBA) • apg : average number of points per game • To predict the winner of the game • Based on the difference between the home team’s apg and the visiting team’s apg for previous games • Training set • Scores of previously played games • Home team classified as a winner or a loser Pattern Recognition
Statistical Decision Theory • Given a game to be played, predict the home team to be a winner or loser using the feature: dapg = Home Team apg – Visiting Team apg Pattern Recognition
Statistical Decision Theory Pattern Recognition
Statistical Decision Theory • A histogram of dapg Pattern Recognition
Statistical Decision Theory • The classification cannot be performed perfectly using the single feature dapg. • Probability of membership in each class • With the smallest expected penalty • Decision boundary or threshold • The value T for Home Team • Won: dapg is less than or equal to T • Lost: dapg is greater than T Pattern Recognition
Statistical Decision Theory • T = -1 • Home team’s apg = 103.4 • Visiting team’s apg = 102.1 • dapg = 103.4 – 102.1 = 1.3 and 1.3 > T • Home team will win the game • T = 0.8 or -6.5 • T = 0.8 achieves the minimum error rate Pattern Recognition
Statistical Decision Theory • Adding an additional feature to increase the accuracy of classification • dwp = Home Team wp – Visiting Team wp • wp denotes the winning percentage Pattern Recognition
Statistical Decision Theory Pattern Recognition
Statistical Decision Theory • Feature vector (dapg, dwp) • Presented as a scatterplot W W W W W W W W W W W L W W W L L W W W L L L L W W W L L L Pattern Recognition
Statistical Decision Theory • The feature space can be divided into two decision region by a straight line • Linear decision boundary • If a feature space cannot be perfectly separated by a straight line, a more complex boundary line might be used. Pattern Recognition
Exercise One • The values of a feature x for nine samples from class A are 1, 2, 3, 3, 4, 4, 6, 6, 8. Nine samples from class B had x values of 4, 6, 7, 7, 8, 9, 9, 10, 12. Make a histogram (with an interval width of 1) for each class and find a decision boundary (threshold) that minimizes the total number of misclassifications for this training data set. Pattern Recognition
Exercise Two • Can the feature vectors (x,y) = (2,3), (3,5), (4,2), (2,7) from class A be separated from four samples from class B located at (6,2), (5,4), (5,6), (3,7) by a linear decision boundary? If so, give the equation of one such boundary and plot it. If not, find a boundary that separates them as well as possible. Pattern Recognition