140 likes | 259 Views
Turorial#3. classification. A Tree Classification algorithm is used to compute a decision tree. Decision trees are easy to understand and modify, and the model developed can be expressed as a set of decision rules. classification.
E N D
classification • A Tree Classification algorithm is used to compute a decision tree. Decision trees are easy to understand and modify, and the model developed can be expressed as a set of decision rules.
classification • By classifying larger data sets, you will be able to improve the accuracy of the Classification model. In Classification, the given situation is a set of example records, called a training set, where each record consists of several fields or attributes. Attributes are either numerical (coming from an ordered domain), or categorical (coming from an unordered domain). One of the attributes, called the class label field (target field), indicates the class to which each example belongs.
classification • A Decision Tree model contains rules to predict the target variable. • The Tree Classification algorithm (ID3).
ID3 Algorithm • First: Calculate Entropy (s) for all data: • Second: Try all attribute and calculate Gain for each one. • Third: Build a tree starting division with maximum Gain.
Hair length Weight Age
9 Persons Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 no yes Hair Length <4? 3 Males 4 Females, 2Males Let us try splitting on Hair length Entropy(4F,2M) = -(4/6)log2(4/6) - (2/6)log2(2/6) = 0.92 Entropy(0F,3M) = -(0/3)log2(0/3) - (3/3)log2(3/3) = 0 Gain(Hair Length < 4) = 0.9911 – (3/9 * 0+ 6/9 * 0.92) = 0.3789
9 Persons Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 no yes Weight < 170? 4 Females, 1 Male 4 Males Let us try splitting on Weight Entropy(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4) = 0 Entropy(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5) = 0.7219 Gain(Weight < 170) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900
9 Persons Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 no yes age <= 40? 3 Females, 3 Males 1 Female, 2 Males Let us try splitting on Age Entropy(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3) = 0.9183 Entropy(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6) = 1 Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183
Decision Tree: 9 Persons no yes Weight < 170? 1 Male 4 Females 4 Males no yes Hair Length < 4? 1 Male 4 Females
Weight < 170? Convert Decision Trees to rules… yes no Hair Length < 4? Male no yes Male Female Rules to Classify Males/Females IfWeightgreater thanorequal 170, classify as Male ElseifHair Lengthless than 4, classify as Male Else classify as Female
Try weka Program • Insert same data (in file test.csv) in example to weka and show the same tree.
References: • Quinlan, J.R. 1986, Machine Learning, 1, 81 • http://dms.irb.hr/tutorial/tut_dtrees.php • http://www.dcs.napier.ac.uk/~peter/vldb/dm/node11.html • http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/4_dtrees2.html • Professor Sin-Min Lee, SJSU. http://cs.sjsu.edu/~lee/cs157b/cs157b.html