230 likes | 239 Views
This article provides an overview of the problem of classification in data mining, including different approaches and key issues. It also explores specific algorithms such as decision trees, K-nearest neighbors, Naïve Bayesian classifiers, and neural networks.
E N D
Classification A task of induction to find patterns CSE 591: Data Mining by H. Liu
Outline • Data and its format • Problem of Classification • Learning a classifier • Different approaches • Key issues CSE 591: Data Mining by H. Liu
Data and its format • Data • attribute-value pairs • with/without class • Data type • continuous/discrete • nominal • Data format • flat CSE 591: Data Mining by H. Liu
Sample data CSE 591: Data Mining by H. Liu
Induction from databases • Inferring knowledge from data • The task of deduction • infer information that is a logical consequence of querying a database • Who conducted this class before? • Which courses are attended by Mary? • Deductive databases: extending the RDBMS CSE 591: Data Mining by H. Liu
Classification • It is one type of induction • data with class labels • Examples - • If weather is rainy then no golf • If • If CSE 591: Data Mining by H. Liu
Different approaches • There exist many techniques • Decision trees • Neural networks • K-nearest neighbors • Naïve Bayesian classifiers • Support Vector Machines • Ensemble methods • Co-training • and many more ... CSE 591: Data Mining by H. Liu
Outlook sunny overcast rain Humidity Wind YES high normal strong weak NO YES NO YES A decision tree CSE 591: Data Mining by H. Liu
Inducing a decision tree • There are many possible trees • let’s try it on the golfing data • How to find the most compact one • that is consistent with the data? • Why the most compact? • Occam’s razor principle • Issue of efficiency w.r.t. optimality CSE 591: Data Mining by H. Liu
Entropy - Information gain - the difference between the node before and after splitting Information gain and CSE 591: Data Mining by H. Liu
Building a compact tree • The key to building a decision tree - which attribute to choose in order to branch. • The heuristic is to choose the attribute with the maximum IG. • Another explanation is to reduce uncertainty as much as possible. CSE 591: Data Mining by H. Liu
Learn a decision tree Outlook sunny overcast rain Humidity Wind YES high normal strong weak NO YES NO YES CSE 591: Data Mining by H. Liu
K-Nearest Neighbor • One of the most intuitive classification algorithm • An unseen instance’s class is determined by its nearest neighbor • The problem is it is sensitive to noise • Instead of using one neighbor, we can use k neighbors CSE 591: Data Mining by H. Liu
K-NN • New problems • lazy learning • large storage • An example • How good is k-NN? CSE 591: Data Mining by H. Liu
Naïve Bayes Classifier • This is a direct application of Bayes’ rule • P(C|X) = P(X|C)P(C)/P(X) X - a vector of x1,x2,…,xn • That’s the best classifier you can build • But, there are problems CSE 591: Data Mining by H. Liu
NBC (2) • Assume conditional independence between xi’s • We have • An example • How good is it in reality? CSE 591: Data Mining by H. Liu
Classification via Neural Networks Squash A perceptron CSE 591: Data Mining by H. Liu
What can a perceptron do? • Neuron as a computing device • To separate a linearly separable points • Nice things about a perceptron • distributed representation • local learning • weight adjusting CSE 591: Data Mining by H. Liu
Linear threshold unit • Basic concepts: projection, thresholding W vectors evoke 1 W = [.11 .6] L= [.7 .7] .5 CSE 591: Data Mining by H. Liu
E.g. 1: solution region for AND problem • Find a weight vector that satisfies all the constraints AND problem 0 0 0 0 1 0 1 0 0 1 1 1 CSE 591: Data Mining by H. Liu
E.g. 2: Solution region for XOR problem? XOR problem 0 0 0 0 1 1 1 0 1 1 1 0 CSE 591: Data Mining by H. Liu
Learning by error reduction • Perceptron learning algorithm • If the activation level of the output unit is 1 when it should be 0, reduce the weight on the link to the ith input unit by r*Li, where Li is the ith input value and r a learning rate • If the activation level of the output unit is 0 when it should be 1, increase the weight on the link to the ith input unit by r*Li • Otherwise, do nothing CSE 591: Data Mining by H. Liu
Multi-layer perceptrons • Using the chain rule, we can back-propagate the errors for a multi-layer perceptrons. Output layer Hidden layer Input layer CSE 591: Data Mining by H. Liu