860 likes | 1.02k Views
Ch13 . Decision Tree. KH Wong. We will learn : the Classification and Regression Tree ( CART) ( or Decision Tree ). CART (Classification and Regression Trees) → uses Gini Index(Classification) as metric.
E N D
Ch13. Decision Tree KH Wong Ch13. Decision tree v9.a
We will learn : the Classification and Regression Tree ( CART) ( or Decision Tree) • CART (Classification and Regression Trees) → usesGini Index(Classification) as metric. • Other approaches: ID3 (Iterative Dichotomiser 3) → uses Entropy function and Information gainas metrics. • https://medium.com/deep-math-machine-learning-ai/chapter-4-decision-trees-algorithms-b93975f7a1f1 https://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/ Ch13. Decision tree v9.a
To build the tree you need training data • You should have enough data for training. It is a supervised learning algorithm • Divide the whole training data (100%) into: • Training set (30%): for training your classifier • Validation set (10%): for tuning the parameters • Test set (20%): for test the performance of your classifier Ch13. Decision tree v9.a
CART can preform classification or regression functions • So when to use classification or regression • Classificationtrees : Outputs are class symbols not real numbers. E.g. high, medium, low etc. • Regressiontrees: Outputs are target variables (real numbers): E.g. 1.234, 5.678 etc. • See http://www.simafore.com/blog/bid/62482/2-main-differences-between-classification-and-regression-trees Ch13. Decision tree v9.a
Classification tree approaches • Famous trees are ID3, C4.5 and CART. • What are the differences ? • We only learn CART here. https://www.quora.com/What-are-the-differences-between-ID3-C4-5-and-CART Ch13. Decision tree v9.a
Decision tree diagram • https://www.python-course.eu/Decision_Trees.php Ch13. Decision tree v9.a
Common terms used with Decision trees • Root Node: It represents entire population or sample and this further gets divided into two or more homogeneous sets. • Splitting: It is a process of dividing a node into two or more sub-nodes. • Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node. • Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node. • Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say opposite process of splitting. • Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree. • Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-nodes whereas sub-nodes are the child of parent node. https://medium.com/greyatom/decision-trees-a-simple-way-to-visualize-a-decision-dc506a403aeb Ch13. Decision tree v9.a
CART Model Representation Root node Attribute (variables) • CART is a binary tree. • Each root node represents a single input variable (x) and a split point on that variable (assuming the variable is numeric). • The leaf nodes of the tree contain an output variable (y) which is used to make a prediction. • Given a dataset with two inputs (x) of height in centimeters and weight in kilograms the output of sex as male or female, below is a crude example of a binary decision tree (completely fictitious for demonstration purposes only). Leaf Node (class varaibleor prediction) https://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/ Ch13. Decision tree v9.a
A simple example of a decision tree • Use height and weight to guess the sex of a person. code The decision tree split this up into rectangles (when p=2 input variables) or some kind of hyper-rectangles with more inputs. Testing to see if a person a male or not Height > 180 cm: No Weight > 80 kg: No Therefore: Female Ch13. Decision tree v9.a
CMSC5707, Ch13. Exercise 1 Decision Tree • Why it is a binary tree? • Answer: ____________________ • How many nodes and leaves? • Answer: ________________ • Male or Female if • 183cm , 77 Kg? ANS:______ • 173 cm , 79 Kg? ANS: _____ • 177 cm , 85 Kg? ANS: ______ Ch13. Decision tree v9.a
CMSC5707, Ch13. Answer 1 Decision Tree • Why it is a binary tree? • Answer: at each node it has 2 leaves • How many nodes and leaves? • Answer: Nodes:2, leaves 3. • Male or Female if • 183 cm , 77 Kg? ANS: Male • 173 cm , 79 Kg? ANS: Female • 177 cm , 85 Kg? ANS: Male Ch13. Decision tree v9.a
How to create a CART • Greedy Splitting : Grow the tree • Stopping Criterion: when the number of samples in a leaf is small enough. • Pruning The Tree: remove unnecessary leaves to • make it more efficient and • solve over fitting problems. Ch13. Decision tree v9.a
Greedy Splitting • During the process of growing the tree, you need to grow the leaves from a node by splitting. • You need a metric to evaluate your split is good or not, e.g. can use one of the followings: • Split using the attribute that the Gini (impurity) index is the lowest . Gini_index=1-(pi)2 • Or Or splitting using • Information gain (based on Entropy) is the highest: • Information gain (IG)=Entropy(parent)-entropy(child) • IG= pparent[log2(pparent)]-pchild[log2 (pchild)] Ch13. Decision tree v9.a
Example: data input 4 bus 3 bus 3 train https://www.saedsayad.com/decision_tree.htm Ch13. Decision tree v9.a
1) Split metric : Entropy • Prob(bus) =4/10=0.4 • Prob(car) =3/10=0.3 • Prob(train)=3/10=0.3 • Entropy=-0.4*log_2(0.4)- 0.3*log_2(0.3)-0.3*log_2(0.3)=1.571 • (note:log_2 is log base 2.) • Another example: if P(bus)=1, P(car)=0, P(train)=0 • Entropy = 1*log_2(1)-0*log_2(0.00001)- 0*log_2(0.000001)=0 • Entropy = 0, it is very pure, Impurity is 0 Ch13. Decision tree v9.a
Exercise 22) Split metric: Gini (impurity) index • Prob(bus) =4/10=0.4 • Prob(car) =3/10=0.3 • Prob(train)=3/10=0.3 • Gini index =1-(0.4*0.4+0.3*0.3+0.3*0.3)= 0.66 • Another example if the class has only bus: if P(bus)=1, P(car)=0, P(train)=0 • Gini Impurity index= 1-1*1-0*0-0*0=0 • Impurity is 0 Ch13. Decision tree v9.a
Answer22) Split metric: Gini (impurity) index • Prob(bus) =4/10=0.4 • Prob(car)=3/10=0.3 • Prob(train)=3/10=0.3 • Gini index =1-(0.4*0.4+0.3*0.3+0.3*0.3)= 0.66 • Another example if the class has only bus: if P(bus)=1, P(car)=0, P(train)=0 • Gini Impurity index= 1-1*1-0*0-0*0=0 • Impurity is 0 Ch13. Decision tree v9.a
Exercise 3. Train • If the first 2 rows are not bus but train, find entropy and Giniindex • Prob(bus) =2/10=0.2 • Prob(car)=3/10=0.3 • Prob(train)=5/10=0.5 • Entropy =_______________________________ • Giniindex =_____________________________ Train Ch13. Decision tree v9.a
ANSWER 3. Train • If the first 2 rows are not bus but train, find entropy and Giniindex • Prob(bus) =2/10=0.2 • Prob(car)=3/10=0.3 • Prob(train)=5/10=0.5 • Entropy =-0.2*log_2(0.2)- 0.3*log_2(0.3)- 0.5log_2(0.5)= 1.485 • Giniindex =1-(0.2*0.2+0.3*0.3+0.5*0.5)= 0.62 Train Ch13. Decision tree v9.a
3) Split metric : Classification error • Classification error=1-max(0.4,0.3,0.3) =1-0.4=0.6 • Another example: if P(bus)=1, P(car)=0, P(train)=0 • Classification error=1-max(1,0,0)=0 • Impurity is 0, if there is only bus Ch13. Decision tree v9.a
4) Split metrics : Variance reduction • Introduced in CART,[3] variance reduction is often employed in cases where the target variable is continuous (regression tree), meaning that use of many other metrics would first require discretization before being applied. The variance reduction of a node N is defined as the total reduction of the variance of the target variable x due to the split at this node: Ch13. Decision tree v9.a https://en.wikipedia.org/wiki/Decision_tree_learning
Splitting procedure: Recursive Partitioning for CART • Take all of your training data. • Consider all possible values of allvariables. • Select the variable/value (X=t1) (e.g. X1=Height)that produces the greatest “separation” (or maximum homogeneity - - less impurity within each of the new part) in the target. • (X=t1) is called a “split”. • If X< t1 (e.g. Height <180cm) then send the data to the “left”; otherwise, send data point to the “right”. • Now repeat same process on these two “nodes” • You get a “tree” • Note: CART only uses binary splits. https://www.casact.org/education/specsem/f2005/handouts/cart.ppt Ch13. Decision tree v9.a
Example 1 • Design a decision tree • https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/ Ch13. Decision tree v9.a
Example1 Ch13. Decision tree v9.a
Gini index or Information gain approach • Gini index • Gini index is a metric for classification tasks in CART. It stores sum of squared probabilities of each class. We can formulate it as illustrated below. • Gini_index= 1 – Σ (Pi)2 for i=1 to number of classes • Split using the attribute that the Gini (impurity) index is the lowest . Gini_index=1-(pi)2 • Or splitting using • Information gain (based on Entropy) is the highest: • Information gain (IG)=Entropy(parent)-entropy(child) • IG=pparent[log2(pparent)]-pchild[log2 (pchild)] Ch13. Decision tree v9.a
Outlook GINI index approach Outlook is a nominal feature. It can be sunny, overcast or rain. I will summarize the final decisions for outlook feature. Gini(Outlook=Sunny) = 1 – (2/5)^2– (3/5)^2 = 0.48 Gini(Outlook=Overcast) = 1 – (4/4)^2– (0/4)^2 = 0 Gini(Outlook=Rain) = 1 – (3/5)^2 – (2/5)^2= 0.48 Then, we will calculate weighted sum of gini indexes for outlook feature. Gini(Outlook) = (5/14) * 0.48 + (4/14) * 0 + (5/14) * 0.48 = 0.343 Information gain by entropy approach: Overall decision: yes=9, no=5 Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 Outlook is a nominal feature. It can be sunny, overcast or rain. I will summarize the final decisions for outlook feature. Weighted_entropy(Outlook=Sunny) =(5/14)*( -(2/5)*log_2(2/5)-(3/5)*log_2(3/5))=0.347 Weighted_entropy(Outlook=Overcast) = (4/14)*( -(4/4)*log_2(4/4)-(0/4)*log_2(0/4))=0 Weighted_entropy(Outlook=Rain) = (5/14)*( -(3/5)*log_2(3/5)-(2/5)*log_2(2/5))=0.347 Information_gain_for_outlook= Parent entropy-Weighted_entropy(Outlook=Sunny)-Weighted_entropy(Outlook=Overcast) -Weighted_entropy(Outlook=Rain)=0.94 -0.347-0-0.347= 0.246 Ch13. Decision tree v9.a
Exercise 4:Temperature GINI index approach Similarly, temperature is a nominal feature and it could have 3 different values: Cool, Hot and Mild. Let’s summarize decisions for temperature feature. Gini(Temp=Hot) = __________________________________? Gini(Temp=Cool) __________________________________? Gini(Temp=Mild) __________________________________? We’ll calculate weighted sum of gini index for temperature feature Gini(Temp) =____________________________________________? Information gain by entropy approach: Overall decision: yes=9, no=5 Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 (same as last page) Humidity is a feature. It can be High or Normal Weighted_entropy(Temp=Hot) =(4/14)*( -(2/4)*log_2(2/4)- (2/4)*log_2(2/4))=0.286 Weighted_entropy(Temp=Cool) =(4/14)*( -(3/4)*log_2(3/4)- (1/4)*log_2(1/4))=0.232 Weighted_entropy(Temp=Mild) =(6/14)*( -(4/6)*log_2(4/6)- (2/6)*log_2(2/6))=0.394 Information_gain_for_humidity= Parent entropy- Weighted_entropy(Temp=Hot) - Weighted_entropy(Temp=Cool) – Weighted_entropy(Temp=Mild) =0.94- 0.286- 0.232- 0.394=0.028 Gi Ch13. Decision tree v9.a
ANSWER 4:Temperature GINI index approach Similarly, temperature is a nominal feature and it could have 3 different values: Cool, Hot and Mild. Let’s summarize decisions for temperature feature. Gini(Temp=Hot) = 1-(2/4)^2- (2/4)^2 = 0.5 Gini(Temp=Cool) = 1-(3/4)^2-(1/4)^2 = 0.375 Gini(Temp=Mild) = 1-(4/6)^2-(2/6)^2 = 0.445 We’ll calculate weighted sum of gini index for temperature feature Gini(Temp) =(4/14) *0.5 +(4/14)*0.375+(6/14)*0.445= 0.439 Information gain by entropy approach: Overall decision: yes=9, no=5 Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 (same as last page) Humidity is a feature. It can be High or Normal Weighted_entropy(Temp=Hot) =(4/14)*( -(2/4)*log_2(2/4)- (2/4)*log_2(2/4))=0.286 Weighted_entropy(Temp=Cool) =(4/14)*( -(3/4)*log_2(3/4)- (1/4)*log_2(1/4))=0.232 Weighted_entropy(Temp=Mild) =(6/14)*( -(4/6)*log_2(4/6)- (2/6)*log_2(2/6))=0.394 Information_gain_for_humidity= Parent entropy- Weighted_entropy(Temp=Hot) - Weighted_entropy(Temp=Cool) – Weighted_entropy(Temp=Mild) =0.94- 0.286- 0.232- 0.394=0.028 Gi Ch13. Decision tree v9.a
Humidity GINI index approach Humidity is a binary class feature. It can be high or normal. Gini(Humidity=High) = 1 – (3/7)^2 – (4/7)^2 = 1 – 0.183 – 0.326 = 0.489 Gini(Humidity=Normal) = 1 – (6/7)^2 – (1/7)^2 = 1 – 0.734 – 0.02 = 0.244 Weighted sum for humidity feature will be calculated next Gini(Humidity) = (7/14) x 0.489 + (7/14) x 0.244 = 0.367 Information gain by entropy approach: Overall decision: yes=9, no=5 Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 (same as last page) Humidity is a feature. It can be High or Normal Weighted_entropy(Humidity=high) =(7/14)*( -(3/7)*log_2(3/7)- (4/7)*log_2(4/7))=0.492 Weighted_entropy(Humidity=Normal) =(7/14)*( -(6/7)*log_2(6/7)- (1/7)*log_2(1/7))=0.296 Information_gain_for_humidity= Parent entropy- Weighted_entropy(Humidity=high) - Weighted_entropy(Humidity=Normal) =0.94- 0.492- 0.296= 0.152 Ch13. Decision tree v9.a
Exercise 5: Wind GINI index approach Wind is a binary class similar to humidity. It can be weak and strong. Gini(Wind=Weak) = 1-(6/8)^2- (2/8)^2 =0.375 Gini(Wind=Strong) = 1-(3/6)^2-(3/6)^2 = 0.5 Gini(Wind) = (8/14) * 0.375 + (6/14) * 0.5 = 0.428 Information gain by entropy approach: Overall decision: yes=9, no=5 Parent entropy= _________________________________________? Weighted_entropy(wind=weak) =____________________________? Weighted_entropy(wind=strong) =___________________________? Information_gain_for_humidity= Parent entropy- Weighted_entropy(Humidity=high) - Weighted_entropy(Humidity=Normal) =____________________? G Ch13. Decision tree v9.a
Answer 5: Wind GINI index approach Wind is a binary class similar to humidity. It can be weak and strong. Gini(Wind=Weak) = 1-(6/8)^2- (2/8)^2 =0.375 Gini(Wind=Strong) = 1-(3/6)^2-(3/6)^2 = 0.5 Gini(Wind) = (8/14) * 0.375 + (6/14) * 0.5 = 0.428 Information gain by entropy approach: Overall decision: yes=9, no=5 Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 (same as last page) Weighted_entropy(wind=weak) =(8/14)*( -(6/8)*log_2(6/8)- (2/8)*log_2(2/8))=0.464 Weighted_entropy(wind=strong) =(6/14)*( -(3/6)*log_2(3/6)- (3/6)*log_2(3/6))=0.428 Information_gain_for_humidity= Parent entropy- Weighted_entropy(Humidity=high) - Weighted_entropy(Humidity=Normal) =0.94- 0.464 - 0.428 = 0.048 G Ch13. Decision tree v9.a
Exercise 6: Time to decideUse either Gini or information gain • We’ve calculated gini index values for each feature. The winner will be outlook feature because its cost is the lowest. We’ll put outlook decision at the top of the tree. • Split using the attribute that the Gini (impurity) index is the lowest . Gini_index=1-(pi)2 • Or splitting using • Information gain (based on Entropy) is the highest: • Information gain (IG) =Entropy(parent)-entropy(child) • IG=pparent[log2(pparent)]-pchild[log2 (pchild)] Ch13. Decision tree v9.a
Answer: 6 Time to decideUse either Gini or information gain • We’ve calculated gini index values for each feature. The winner will be outlook feature because its cost is the lowest. We’ll put outlook decision at the top of the tree. • Split using the attribute that the Gini (impurity) index is the lowest . Gini_index=1-(pi)2 • Or splitting using • Information gain (based on Entropy) is the highest: • Information gain (IG) =Entropy(parent)-entropy(child) • IG=pparent[log2(pparent)]-pchild[log2 (pchild)] Both results agree with each other Ch13. Decision tree v9.a
Time to decide : Put outlook decision at the top of the tree. All “Decision=yes”, so the branch for “Overcast” is over Ch13. Decision tree v9.a
You might realize that sub dataset in the overcast leaf has only yes decisions. This means that overcast leaf is over. Ch13. Decision tree v9.a
We will apply same principles to those sub datasets in the following steps. Focus on the sub dataset for sunny outlook. We need to find the Gini index scores for temperature, humidity and wind features respectively. Total population total under Outlook_sunny=5, yes=2,no=3 Ch13. Decision tree v9.a
Gini of temperature for sunny outlook Gini approach Gini(Outlook=Sunny & Temp=Hot) = 1-(0/2)^2-(2/2)^2 = 0 Gini(Outlook=Sunny & Temp=Cool) =1-(1/1)^2-(0/1)^2 = 0 Gini(Outlook=Sunny & Temp=Mild) = 1-(1/2)^2-(1/2)^2= 0.5 Gini(Outlook=Sunny & Temp)=(2/5)*0+(1/5)*0+(2/5)*0.5 = 0.2 Information gain by entropy : Total population under Outlook_sunny=5, yes=2,no=3 Parent entropy_Outlook_sunny= -(2/5)*log_2(2/5)-(3/5)*log_2(3/5) = 0.97 Weighted_entropy(outlook_sunny=hot) =(2/5)*( -(0/2)*log_2(0/2)- (2/2)*log_2(2/2))=0 Weighted_entropy(outlook_sunny=cool) =(1/5)*( -(0/1)*log_2(0/1)- (0/1)*log_2(0/1))=0 Weighted_entropy(outlook_sunny=Mild) =(2/5)*( -(1/2)*log_2(1/2)- (1/2)*log_2(1/2))=0.4 Information_gain_for_outlook_sunny= Parent entropy_Outlook_sunny - Weighted_entropy(outlook_sunny=hot) – Weighted_entropy(outlook_sunny=cool)- Weighted_entropy(outlook_sunny=Mild)= 0.97 -0-0-0.4= 0.57 Ch13. Decision tree v9.a
Gini of humidity for sunny outlook • Gini approach • Gini(Outlook=Sunny and Humidity=High) = 1-(0/3)^2-(3/3)^2 = 0 • Gini(Outlook=Sunny and Humidity=Normal) = 1-(2/2)^2-(0/2)^2 = 0 • Gini(Outlook=Sunny and Humidity) = (3/5)*0 + (2/5)*0 = 0 Weighted_entropy(outlook_humidity=high) =(3/5)*( -(0/3)*log_2(0/3)- (3/3)*log_2(3/3))=0 Weighted_entropy(outlook_humidity=normal) =(2/5)*( -(2/2)*log_2(2/2)- (0/2)*log_2(0/2))=0 Information_gain_for_outlook_humidity= Parent entropy_Outlook_sunny - Weighted_entropy(outlook_humidity=high)-Weighted_entropy(outlook_humidity=normal)= 0.97 -0-0= 0.97 Ch13. Decision tree v9.a
Gini of wind for sunny outlook Gini Approach Gini(Outlook=Sunny and Wind=Weak) = 1-(1/3)^2-(2/3)^2=0.445 Gini(Outlook=Sunny and Wind=Strong) = 1-(1/2)^2-(1/2)^2 = 0.5 Gini(Outlook=Sunny and Wind) = (3/5)*0.445 + (2/5)*0.5 = 0.467 Weighted_entropy(Outlook=Sunny and Wind=Weak) =(3/5)*( -(1/3)*log_2(1/3)- (2/3)*log_2(2/3))=0.551 Weighted_entropy(Outlook=Sunny and Wind=strong) =(2/5)*( -(1/2)*log_2(1/2)- (1/2)*log_2(1/2))=0.4 Information_gain_for_Outlook_Sunny= Parent entropy_Outlook_sunny - Weighted_entropy(Outlook=Sunny and Wind=Weak) - Weighted_entropy(Outlook=Sunny and Wind=strong) =0.97-0.551-0.4=0.019 Ch13. Decision tree v9.a
Decision for sunny outlook • We’ve calculated gini index scores for feature when outlook is sunny. The winner is humidity because it has the lowest value. We’ll put humidity check at the extension of sunny outlook • Split using the attribute that the Gini (impurity) index is the lowest . Gini_index=1-(pi)2 • Or splitting using • Information gain (based on Entropy) is the highest: • Information gain (IG)=Entropy(parent)-entropy(child) • IG=pparent[log2(pparent)]-pchild[log2 (pchild)] Both results agree with each other. Humidity is picked as the second level node Ch13. Decision tree v9.a
Result Both results agree with each other. Humidity is picked as the second level node Pure yes Pure No When humidity is “High”, decision is pure “No” When humidity is “Normal”, decision is pure “Yes” Ch13. Decision tree v9.a
As seen, decision is always “no” for high humidity and sunny outlook. On the other hand, decision will always be “yes” for normal humidity and sunny outlook. This branch is over. Ch13. Decision tree v9.a
Now we will work on the Rain branch Now, we need to focus on rain outlook. We’ll calculate Gini index scores for temperature, humidity and wind features when outlook is rain. Ch13. Decision tree v9.a
Gini of temperature for rain outlook Gini Approach Gini(Outlook=Rain and Temp.=Cool) = 1-(1/2)^2-(1/2)^2 = 0.5 Gini(Outlook=Rain and Temp.=Mild) = 1-(2/3)^2-(1/3)^2 = 0.444 Gini(Outlook=Rain and Temp.) = (2/5)*0.5 + (3/5)*0.444 = 0.466 Information gain by entropy : Total population under Outlook_rain=5, yes=3,no=2 Parent entropy_Outlook_rain= -(3/5)*log_2(3/5)-(2/5)*log_2(2/5) = 0.97 Weighted_entropy(Outlook_rain=cool) =(2/5)*( -(1/2)*log_2(1/2)- (1/2)*log_2(1/2))=0.4 Weighted_entropy(Outlook_rain=Mild) =(3/5)*( -(2/3)*log_2(2/3)- (1/3)*log_2(1/3))=0.551 Information_gain_for_Outlook_rain= Parent entropy_Outlook_rain - Weighted_entropy(Outlook_rain=cool) - Weighted_entropy(Outlook_rain=Mild) = 0.97 -0.4-0.551= 0.019 Ch13. Decision tree v9.a
Gini of Humidity for rain outlook Gini approach Gini(Outlook=Rain and humidity=High) = 1-(0/3)^2-(3/3)^2 =0 Gini(Outlook=Rain and humidity=normal) = 1-(2/2)^2-(0/2)^2 = 0 Gini(Outlook=Rain and humidity) = (3/5)*0 + (2/5)*0 = 0 Weighted_entropy(Outlook=rain and humidity=high) =(3/5)*( -(0/3)*log_2(0/3)- (3/3)*log_2(3/3))=0 Weighted_entropy(Outlook=rain and humidity=normal) =(2/5)*( -(2/2)*log_2(2/2)- (0/2)*log_2(0/2))=0 Information_gain_for_outlook_rain= Parent entropy_Outlook_rain- Weighted_entropy(Outlook=rain and humidity=high) - Weighted_entropy(Outlook=rain and humidity=high) =0.97-0-0=0.97 Ch13. Decision tree v9.a
Gini of wind for rain outlook Gini approach Gini(Outlook=Rain Gini(Outlook=Rain and Wind=Weak) = 1-(3/3)^2-(0/3)^2 = 0 Gini(Outlook=Rain and Wind=Strong) = 1-(0/2)^2-(2/2)^2 = 0 Gini(Outlook=Rain and Wind) = (3/5)*0 + (2/5)*0 = 0 and Wind=Weak) = 1-(3/3)^2-(0/3)^2 = 0 Gini(Outlook=Rain and Wind=Strong) = 1-(0/2)^2-(2/2)^2 = 0 Gini(Outlook=Rain and Wind) = (3/5)*0 + (2/5)*0 = 0 Weighted_entropy(Outlook=rain and Wind=Weak) =(3/5)*( -(3/3)*log_2(3/3)- (0/3)*log_2(0/3))=0 Weighted_entropy(Outlook=rain and Wind=strong) =(2/5)*( -(0/2)*log_2(0/2)- (2/2)*log_2(2/2))=0 Information_gain_for_outlook_rain= Parent entropy_Outlook_rain - Weighted_entropy(Outlook=rain and Wind=Weak) - Weighted_entropy(Outlook=rain and Wind=strong) = 0.97-0-0=0.97 Ch13. Decision tree v9.a
Decision for rain outlook The winner is wind feature for rain outlook because it has the minimum gini index score in features. • Put the wind feature for rain outlook branch and monitor the new sub data sets. Split using the attribute that the Gini (impurity) index is the lowest . Gini_index=1-(pi)2 • Or • Information gain (based on Entropy) is the highest: • Information gain (IG) =Entropy(parent)-entropy(child) • IG=pparent[log2(pparent)]-pchild[log2 (pchild)] Ch13. Decision tree v9.a
Put the wind feature for rain outlook branch and monitor the new sub data sets. Can repeat the calculation to find the complete solution. • However, you might realize that sub dataset in the overcast leaf has only yes decisions. This means that overcast leaf is over. Pure :yes Pure :yes Sub data sets for weak and strong wind and rain outlook Ch13. Decision tree v9.a
Final result • As seen, decision is always “yes” when “wind” is “weak”. On the other hand, decision is always “no” if “wind” is “strong”. This means that this branch is over. Ch13. Decision tree v9.a
Example 2 Design a tree to find out whether an umbrella is needed Ch13. Decision tree v9.a