1 / 51

Data Mining Techniques: Classification

Data Mining Techniques: Classification. Classification. What is Classification? Classifying tuples in a database In training set E each tuple consists of the same set of multiple attributes as the tuples in the large database W additionally, each tuple has a known class identity

Download Presentation

Data Mining Techniques: Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Techniques:Classification

  2. Classification • What is Classification? • Classifying tuples in a database • In training set E • each tuple consists of the same set of multiple attributes as the tuples in the large database W • additionally, each tuple has a known class identity • Derive the classification mechanism from the training set E, and then use this mechanism to classify general data (in W)

  3. Learning Phase • Learning • The class label attribute is credit_rating • Training data are analyzed by a classification algorithm • The classifier is represented in the form of classification rules

  4. Testing Phase • Testing (Classification) • Test data are used to estimate the accuracy of the classification rules • If the accuracy is considered acceptable, the rules can be applied to the classification of new data tuples

  5. Classification by Decision Tree A top-down decision tree generation algorithm: ID-3 and its extended version C4.5 (Quinlan’93): J.R. Quinlan, C4.5 Programs for Machine Learning, Morgan Kaufmann, 1993

  6. Decision Tree Generation • At start, all the training examples are at the root • Partition examples recursively based on selected attributes • Attribute Selection • Favoring the partitioning which makes the majority of examples belong to a single class • Tree Pruning (Overfitting Problem) • Aiming at removing tree branches that may lead to errors when classifying test data • Training data may contain noise, …

  7. Another Examples 1 2 3 4 5 6 7 8 9 10 11

  8. Decision Tree

  9. Decision Tree

  10. Decision Tree Generation • Attribute Selection (Split Criterion) • Information Gain (ID3/C4.5/See5) • Gini Index (CART/IBM Intelligent Miner) • Inference Power • These measures are also called goodness functions and used to select the attribute to split at a tree node during the tree generation phase

  11. Decision Tree Generation • Branching Scheme • Determining the tree branch to which a sample belongs • Binary vs. K-ary Splitting • When to stop the further splitting of a node • Impurity Measure • Labeling Rule • A node is labeled as the class to which most samples at the node belongs

  12. Decision Tree Generation Algorithm: ID3 ID: Iterative Dichotomiser (7.1)  Entropy

  13. Decision Tree Algorithm: ID3

  14. Decision Tree Algorithm: ID3

  15. Decision Tree Algorithm: ID3

  16. Decision Tree Algorithm: ID3 yes

  17. Decision Tree Algorithm: ID3

  18. Exercise 2

  19. Decision Tree Generation Algorithm: ID3

  20. Decision Tree Generation Algorithm: ID3

  21. Decision Tree Generation Algorithm: ID3

  22. How to Use a Tree • Directly • Test the attribute value of unknown sample against the tree. • A path is traced from root to a leaf which holds the label • Indirectly • Decision tree is converted to classification rules • One rule is created for each path from the root to a leaf • IF-THEN is easier for humans to understand

  23. Generating Classification Rules

  24. Generating Classification Rules

  25. Generating Classification Rules • There are 4 decision rules are generated by the tree • Watch the game and home team wins and out with friends then bear • Watch the game and home team wins and sitting at home then diet soda • Watch the game and home team loses and out with friend then bear • Watch the game and home team loses and sitting at home then milk • Optimization for these rules • Watch the game and out with friends then bear • Watch the game and home team wins and sitting at home then diet soda • Watch the game and home team loses and sitting at home then milk

  26. Decision Tree Generation Algorithm: ID3 • All attributes are assumed to be categorical (discretized) • Can be modified for continuous-valued attributes • Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals • A  V | A < V • Prefer Attributes with Many Values • Cannot Handle Missing Attribute Values • Attribute dependencies do not consider in this algorithm

  27. Attribute Selection in C4.5

  28. Handling Continuous Attributes

  29. Handling Continuous Attributes

  30. Third Cut Second Cut First Cut Handling Continuous Attributes Sorted By Sorted By  

  31. Handling Continuous Attributes Root First Cut Price On Date T+1 > 18.02 Price On Date T+1 <= 18.02 Second Cut Buy Price On Date T > 17.84 Price On Date T <= 17.84 Third Cut Sell Price On Date T+1 > 17.70 Price On Date T+1 <= 17.70 Buy Sell

  32. Exercise 3:分析房價 CM : No. of Homes in Community SF : Square Feet

  33. Unknown Attribute Values in C4.5 Training Testing

  34. Unknown Attribute Values Adjustment of Attribute Selection Measure

  35. Fill in Approach

  36. Probability Approach

  37. Probability Approach

  38. Unknown Attribute Values Partitioning theTraining Set

  39. Probability Approach

  40. Unknown Attribute Values Classifying an Unseen Case

  41. Probability Approach

  42. Evaluation – Coincidence Matrix Cost = $190 * (closing good account) + $10 * (keeping bad account open) Decision Tree Model Accuracy (正確率)= (36+632) / 718 = 93.0% Precision (精確率)for Insolvent = 36/58 = 62.01% Recall (捕捉率)for Insolvent = 36/64 = 56.25% F Measure = 2 * Precision * Recall / (Precision + Recall) = 2 * 62.01% * 56.25% / (62.01% + 56.25%)= 0.7 / 1.1826 = 0.59 Cost = $190 * 22 + $10 * 28 = $4,460

  43. Decision Tree Generation Algorithm: Gini Index • If a data set S contains examples from n classes, gini index, gini(S), is defined aswhere pj is the relative frequency of class Cj in S. • If a data set S is split into two subsets S1 and S2 with sizes N1 and N2 respectively, the gini index of the split data contains examples from n classes, the gini index, gini(S), is defined as

  44. Decision Tree Generation Algorithm: Gini Index • The attribute provides the smallest ginisplit(S) is chosen to split the node • The computation cost of gini index is less than information gain • All attributes are binary splitting in IBM Intelligent Miner • A  V | A < V

  45. Decision Tree Generation Algorithm: Inference Power • A feature that is useful in inferring the group identity of a data tuple is said to have a good inference power to that group identity. • In Table 1, given attributes (features) “Gender”, “Beverage”, “State”, try to find their inference power to “Group id”

  46. Naive Bayesian Classification • Each data sample is a n-dim feature vector • X = (x1, x2, .. xn) for attributes A1, A2, … An • Suppose there are m classes • C = {C1, C2,.. Cm} • The classifier will predict X to the class Ci that has the highest posterior probability, conditioned on X • X belongs to Ci iff P(Ci|X) > P(Cj|X) for all 1<=j<=m, j!=i

  47. Naive Bayesian Classification • P(Ci|X) = P(X|Ci) P(Ci) / P(X) • P(Ci|X) = P(Ci∪X) / P(X) ; P(X|Ci) = P(Ci∪X) / P(Ci) => P(Ci|X) P(X) = P(X|Ci) P(Ci) • P(Ci) = si / s • si is the number of training sample of class Ci • s is the total number of training samples • Assumption: Independent between Attributes • P(X|Ci) = P(x1|Ci) P(x2|Ci) P(x3|Ci) ... P(xn|Ci) • P(X) can be ignored

  48. Naive Bayesian Classification Classify X=(age=“<=30”, income=“medium”, student=“yes”, credit-rating=“fair”) • P(buys_computer=yes) = 9/14 • P(buys_computer=no)=5/14 • P(age=<30|buys_computer=yes)=2/9 • P(age=<30|buys_computer=no)=3/5 • P(income=medium|buys_computer=yes)=4/9 • P(income=medium|buys_computer=no)=2/5 • P(student=yes|buys_computer=yes)=6/9 • P(student=yes|buys_computer=no)=1/5 • P(credit-rating=fair|buys_computer=yes)=6/9 • P(credit-rating =fair|buys_computer=no)=2/5 • P(X|buys_computer=yes)=0.044 • P(X|buys_computer=no)=0.019 • P(buys_computer=yes|X) = P(X|buys_computer=yes)P(buys_computer=yes)=0.028 • P(buys_computer=no|X) = P(X|buys_computer=no) P(buys_computer=no)=0.007

More Related