1 / 26

SEEM4630 2013-2014 Tutorial 2 Classification : Decision tree, Naïve Bayes & k-NN

SEEM4630 2013-2014 Tutorial 2 Classification : Decision tree, Naïve Bayes & k-NN. Wentao TIAN, wttian@se.cuhk.edu.hk. Classification: Definition. Given a collection of records ( training set ), each record contains a set of attributes , one of the attributes is the class .

treva
Download Presentation

SEEM4630 2013-2014 Tutorial 2 Classification : Decision tree, Naïve Bayes & k-NN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SEEM4630 2013-2014 Tutorial 2 Classification:Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk

  2. Classification: Definition • Given a collection of records (training set ), each record contains a set of attributes, one of the attributes is the class. • Find a model for class attribute as a function of the values of other attributes. • Decision tree • Naïve bayes • k-NN • Goal: previously unseenrecords should be assigned a class as accurately as possible.

  3. Decision Tree • Goal • Construct a tree so that instances belonging to different classes should be separated • Basic algorithm (a greedy algorithm) • Tree is constructed in a top-down recursive manner • At start, all the training examples are at the root • Test attributes are selected on the basis of a heuristics or statistical measure (e.g., information gain) • Examples are partitioned recursively based on selected attributes

  4. Let pibe the probability that a tuple belongs to class Ci, estimated by |Ci,D|/|D| Expected information (entropy) needed to classify a tuple in D: Information needed (after using A to split D into v partitions) to classify D: Informationgained by branching on attribute A Attribute Selection Measure 1: Information Gain

  5. Information gain measure is biased towards attributes with a large number of values C4.5 (a successor of ID3) uses gain ratio to overcome the problem (normalization to information gain) GainRatio(A) = Gain(A)/SplitInfo(A) Attribute Selection Measure 2: Gain Ratio

  6. If a data set D contains examples from n classes, gini index, gini(D) is defined as where pj is the relative frequency of class j in D If a data set D is split on A into two subsets D1 and D2, the gini index gini(D) is defined as Reduction in Impurity: Attribute Selection Measure 3: Gini index

  7. Example

  8. Tree induction example • Entropy of data S • Split data by attribute Outlook Info(S) = -9/14(log2(9/14))-5/14(log2(5/14)) = 0.94 Sunny [2+,3-] S[9+, 5-] Outlook Overcast [4+,0-] Rain [3+,2-] Gain(Outlook) = 0.94 – 5/14[-2/5(log2(2/5))-3/5(log2(3/5))] – 4/14[-4/4(log2(4/4))-0/4(log2(0/4))] – 5/14[-3/5(log2(3/5))-2/5(log2(2/5))] = 0.94 – 0.69 = 0.25

  9. Tree induction example • Split data by attribute Temperature <15 [3+,1-] S[9+, 5-] Temperature 15-25 [5+,1-] >25 [2+,2-] Gain(Temperature) = 0.94 – 4/14[-3/4(log2(3/4))-1/4(log2(1/4))] – 6/14[-5/6(log2(5/6))-1/6(log2(1/6))] – 4/14[-2/4(log2(2/4))-2/4(log2(2/4))] = 0.94 – 0.80 = 0.14

  10. High [3+,4-] S[9+, 5-] Humidity Normal [6+, 1-] Weak [6+, 2-] S[9+, 5-] Wind Strong [3+, 3-] Tree induction example • Split data by attribute Humidity • Split data by attribute Wind Gain(Humidity) = 0.94 – 7/14[-3/7(log2(3/7))-4/7(log2(4/7))] – 7/14[-6/7(log2(6/7))-1/7(log2(1/7))] = 0.94 – 0.79 = 0.15 Gain(Wind) = 0.94 – 8/14[-6/8(log2(6/8))-2/8(log2(2/8))] – 6/14[-3/6(log2(3/6))-3/6(log2(3/6))] = 0.94 – 0.89 = 0.05

  11. Outlook Sunny Overcast Rain ?? Yes ?? • Tree induction example Outlook Temperature Humidity Wind Play Tennis Sunny >25 High Weak No Gain(Outlook) = 0.25 Gain(Temperature)=0.14 Gain(Humidity) = 0.15 Gain(Wind) = 0.05 Sunny >25 High Strong No Overcast >25 High Weak Yes Rain 15-25 High Weak Yes Rain <15 Normal Weak Yes Rain <15 Normal Strong No Overcast <15 Normal Strong Yes Sunny 15-25 High Weak No Sunny <15 Normal Weak Yes Rain 15-25 Normal Weak Yes Sunny 15-25 Normal Strong Yes Overcast 15-25 High Strong Yes Overcast >25 Normal Weak Yes Rain 15-25 High Strong No

  12. High [0+,3-] Sunny[2+,3-] Humidity Normal [2+, 0-] Weak [1+, 2-] Sunny[2+, 3-] Wind Strong [1+, 1-] • Entropy of branch Sunny • Split Sunny branch by attribute Temperature • Split Sunny branch by attribute Humidity • Split Sunny branch by attribute Wind Info(Sunny) = -2/5(log2(2/5))-3/5(log2(3/5)) = 0.97 Gain(Temperature) = 0.97 – 1/5[-1/1(log2(1/1))-0/1(log2(0/1))] – 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))] = 0.97 – 0.4 = 0.57 <15 [1+,0-] Sunny[2+,3-] Temperature 15-25 [1+,1-] >25 [0+,2-] Gain(Humidity) = 0.97 – 3/5[-0/3(log2(0/3))-3/3(log2(3/3))] – 2/5[-2/2(log2(2/2))-0/2(log2(0/2))] = 0.97 – 0 = 0.97 Gain(Wind) = 0.97 – 3/5[-1/3(log2(1/3))-2/3(log2(2/3))] – 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] = 0.97 – 0.95= 0.02

  13. Tree induction example Outlook Sunny Overcast Rain Humidity Yes ?? High Normal No Yes

  14. High [1+,1-] Rain[3+,2-] Humidity Normal [2+, 1-] Weak [3+, 0-] Rain[3+,2-] Wind Strong [0+, 2-] • Entropy of branch Rain • Split Rain branch by attribute Temperature • Split Rain branch by attribute Humidity • Split Rain branch by attribute Wind Info(Rain) = -3/5(log2(3/5))-2/5(log2(2/5)) = 0.97 Gain(Outlook) = 0.97 – 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))] – 0/5[-0/0(log2(0/0))-0/0(log2(0/0))] = 0.97 – 0.95 = 0.02 <15 [1+,1-] Rain[3+,2-] Temperature 15-25 [2+,1-] >25 [0+,0-] Gain(Humidity) = 0.97 – 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))] = 0.97 – 0.95 = 0.02 Gain(Wind) = 0.97 – 3/5[-3/3(log2(3/3))-0/3(log2(0/3))] – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))] = 0.97 – 0 = 0.97

  15. Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Weak Strong No No Yes Yes

  16. Model: compute from data Bayesian Classification • A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities • where xi is the value of attribute Ai • Choose the class label that has the highest probability • Foundation: Based on Bayes’ Theorem. • posteriori probability • prior probability • likelihood

  17. Problem: joint probabilities are difficult to estimate Naïve Bayes Classifier Assumption: attributes are conditionally independent Naïve Bayes Classifier

  18. Example: Naïve Bayes Classifier P(C=t) = 1/2 P(C=f) = 1/2 P(A=m|C=t) = 2/5 P(A=m|C=f) = 1/5 P(B=q|C=t) = 2/5 P(B=q|C=f) = 2/5 Test Record: A=m, B=q, C=?

  19. Example: Naïve Bayes Classifier Higher! • For C = t • P(A=m|C=t) * P(B=q|C=t) * P(C=t) = 2/5 * 2/5 * 1/2 = 2/25 • P(C=t|A=m, B=q) = (2/25) / P(A=m, B=q) • For C = f • P(A=m|C=f) * P(B=q|C=f) * P(C=f) = 1/5 * 2/5 * 1/2 = 1/25 • P(C=t|A=m, B=q) = (1/25) / P(A=m, B=q) • Conclusion: A=m, B=q, C=t

  20. Nearest Neighbor Classification • Input • A set of stored records • k: # of nearest neighbors • Output • Compute distance: • Identify k nearest neighbors • Determine the class label of unknown record based on class labels of nearest neighbors (i.e. by taking majority vote)

  21. Nearest Neighbor Classification A Discrete Example • Calculate the distances: • d(P1, Pn) = • d(P2, Pn) = 3.80 • d(P3, Pn) = 2.12 • d(P4, Pn) = 1.12 • d(P5, Pn) = 1.58 • d(P6, Pn) = 2 • d(P7, Pn) = 1 • d(P8, Pn) = 2.12 • Input Given 8 training instances • P1 (4, 2)  Orange • P2 (0.5, 2.5)  Orange • P3 (2.5, 2.5)  Orange • P4 (3, 3.5)  Orange • P5 (5.5, 3.5)  Orange • P6 (2, 4)  Black • P7 (4, 5)  Black • P8 (2.5, 5.5)  Black k = 1 & k = 3 • New Instance: • Pn (4, 4) ?

  22. P8 P8 P7 P7 Pn Pn P6 P6 P4 P4 P5 P5 P3 P2 P3 P2 P1 P1 Nearest Neighbor Classification k = 3 k = 1

  23. Nearest Neighbor Classification… • Scaling issues • Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes • Each attribute must follow in the same range • Min-Max normalization • Example: • Two data records:a = (1, 1000), b = (0.5, 1) • dis(a, b) = ?

  24. P8 P8 P7 P7 Pn Pn P6 P6 P4 P4 P5 P5 P3 P3 P2 P2 P1 P1 Classification: Lazy & Eager Learning • Two Types of Learning Methodologies • Lazy Learning • Instance-based learning. (k-NN) • Eager Learning • Decision-tree and Bayesian classification. • ANN & SVM

  25. Differences Between Lazy &Eager Learning • Lazy Learning • Do not require model building • Less time training but more time predicting • Lazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function • Eager Learning • Require model building • More time training but less time predicting • Must commit to a single hypothesis that covers the entire instance space

  26. Thank you & Questions?

More Related