Decision Tree

DecisionTree Data Science Program Bangalore

Introduction 2 Givenanevent,predictiscategory. Examples: Whowonagivenballgame? Howshouldwefileagivenemail? What word sense was intended for a given occurrence of aword?  Event = list of features. Examples: Ballgame:Whichplayerswereon offense? Email: Who sent theemail? Disambiguation: What was the preceding word?

Introduction 3  Use adecisiontreetopredictcategoriesfornewevents.  Usetrainingdatatobuildthedecisiontree. New Events Training Events and Categories Decision Tree Category

Decision Tree forPlayTennis 4 Outlook Sunny Overcast Rain Humidity Each internal node tests anattribute High Normal Each branch corresponds toan attribute valuenode Each leaf node assigns aclassification No Yes

Word SenseDisambiguation 5  Givenanoccurrenceofaword,decidewhichsense,ormeaning,was intended.  Example:"run"  run1: move swiftly (I ran to thestore.)  run2: operate (I run astore.)  run3: flow (Water runs from thespring.) run4:lengthoftornstitches(Herstockingshadarun.) etc.

Word SenseDisambiguation 6 Categories  Use word sense labels (run1, run2, etc.) to name the possiblecategories. Features Featuresdescribethecontextofthewordwewantto disambiguate. Possiblefeaturesinclude: near(w):isthegivenwordnearanoccurrenceofwordw? pos:theword’spartofspeech left(w):isthewordimmediatelyprecededbythewordw? etc.

Word SenseDisambiguation 7 Example decision tree: pos nounverb near(race) yes no run1 near(river) yes no near(stocking) yes no run4 run3 (Note:DecisiontreesforWSDtendtobequitelarge)

WSD: Sample TrainingData 8

Decision Tree forConjunction 9 Outlook=Sunny Wind=Weak Outlook Sunny Overcast Rain Wind No No Strong Weak No Yes

Decision Tree forDisjunction 10 Outlook=Sunny Wind=Weak Outlook Sunny Overcast Rain Yes Wind Wind Strong Weak Strong Weak No Yes No Yes

Decision Tree forXOR 11 Outlook=Sunny XORWind=Weak Outlook Sunny Overcast Rain Wind Wind Wind Strong Weak Strong Weak Strong Weak Yes No No Yes No Yes

DecisionTree 12 • decision trees represent disjunctions ofconjunctions Outlook Sunny Overcast Rain Humidity Yes Wind Strong Weak High Normal No Yes No Yes (Outlook=Sunny Humidity=Normal) (Outlook=Overcast) (Outlook=Rain Wind=Weak)

When toconsider Decision Trees 13 Instancesdescribablebyattribute-valuepairs Targetfunctionisdiscretevalued Disjunctivehypothesismayberequired Possibly noisy training data Missing attribute values Examples: Medicaldiagnosis Credit riskanalysis Object classification for robot manipulator (Tan1993)

Top-Down Induction ofDecision TreesID3 14 A the “best” decision attribute for nextnode Assign A as decision attribute fornode For each value of A create newdescendant Sort training examples to leaf node according to the attribute value of thebranch If all training examples are perfectly classified (same value of target attribute) stop, else iterate over new leafnodes.

Which attribute isbest? 15 A1=? [29+,35-] A2=? [29+,35-] G H L M [18+,33-] [21+,5-] [8+,30-] [11+,2-]

Entropy 16 Sisasampleoftrainingexamples p+istheproportionofpositiveexamples  p- is the proportion of negative examples EntropymeasurestheimpurityofS Entropy(S) = -p+ log2 p+ - p- log2p-

Entropy 17  Entropy(S)= expected number of bits needed to encodeclass (+ or -) of randomly drawn members of S (under the optimal, shortestlength-code) Why? Informationtheoryoptimallengthcodeassign –log2pbitstomessageshavingprobabilityp.  Sotheexpectednumberofbitstoencode (+ or -) of random member ofS: -p+ log2 p+ - p- log2p-

Information Gain(S=E) 18 Gain(S,A):expectedreductioninentropydueto sorting S on attributeA Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log235/64 = 0.99 [29+,35-] A1=? A2=? [29+,35-] G H True False [21+,5-] [8+,30-] [18+,33-] [11+,2-]

InformationGain 19 Entropy([21+,5-]) =0.71 Entropy([8+,30-]) =0.74 Gain(S,A1)=Entropy(S) -26/64*Entropy([21+,5-]) -38/64*Entropy([8+,30-]) =0.27 [29+,35-] A1=? Entropy([18+,33-]) =0.94 Entropy([11+,2-]) =0.62 Gain(S,A2)=Entropy(S) -51/64*Entropy([18+,33-]) -13/64*Entropy([11+,2-]) =0.12 A2=? [29+,35-] True False True False [18+,33-] [11+,2-] [21+,5-] [8+,30-]

TrainingExamples 20

Selecting the NextAttribute 21 S=[9+,5-] E=0.940 S=[9+,5-] E=0.940 Humidity Wind High Normal Weak Strong [3+,4-] [6+,1-] [6+,2-] [3+,3-] E=0.811 E=1.0 Gain(S,Wind) =0.940-(8/14)*0.811 – (6/14)*1.0 =0.048 E=0.985 E=0.592 Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 Humidity provides greater info. gain than Wind, w.r.t targetclassification.

Selecting the NextAttribute 22 S=[9+,5-] E=0.940 Outlook Over cast Rain Sunny [3+,2-] [2+,3-] [4+,0] E=0.971 E=0.0 E=0.971 Gain(S,Outlook) =0.940-(5/14)*0.971 -(4/14)*0.0 –(5/14)*0.0971 =0.247

Selecting the NextAttribute 23 • The information gain values for the 4 attributes are: • Gain(S,Outlook)=0.247 • Gain(S,Humidity)=0.151 • Gain(S,Wind)=0.048 • Gain(S,Temperature)=0.029 • where S denotes the collection of training examples

ID3Algorithm 24 [D1,D2,…,D14] Outlook [9+,5-] Sunny Overcast Rain Ssunny =[D1,D2,D8,D9,D11] [D3,D7,D12,D13] [D4,D5,D6,D10,D14] [2+,3-] [4+,0-] [3+,2-] Yes ? ? Gain(Ssunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970 Gain(Ssunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570 Gain(Ssunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) =0.019

ID3Algorithm 25 Outlook Sunny Overcast Rain Humidity Yes Wind [D3,D7,D12,D13] Strong Weak High Normal No Yes No Yes [D8,D9,D11] [mistake] [D6,D14] [D4,D5,D10] [D1,D2]

Occam’sRazor 26 ”Iftwotheoriesexplainthefactsequallyweel,thenthe simpler theory is to bepreferred” Arguments infavor: Fewer short hypotheses than long hypotheses Ashorthypothesisthatfitsthedataisunlikelytobe acoincidence Alonghypothesisthatfitsthedatamightbea coincidence Argumentsopposed: Therearemanywaystodefinesmallsetsof hypotheses

Overfitting 27  One of the biggest problems with decision trees isOverfitting

AvoidOverfitting 28  Stopgrowingwhensplitnotstatisticallysignificant  Grow full tree, thenpost-prune Select “best”tree:  measure performance over trainingdata  measureperformanceoverseparatevalidationdataset  min(|tree|+|misclassifications(tree)|)

Effect of ReducedError Pruning 29

Converting aTree to Rules 30 Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Strong Weak No Yes No Yes R1: If (Outlook=Sunny) (Humidity=High) Then PlayTennis=No R2: If (Outlook=Sunny) (Humidity=Normal) Then PlayTennis=Yes R3: If (Outlook=Overcast) ThenPlayTennis=Yes R4: If(Outlook=Rain)(Wind=Strong) Then PlayTennis=No R5: If(Outlook=Rain)(Wind=Weak) ThenPlayTennis=Yes

Continuous ValuedAttributes 31 Createadiscreteattributetotestcontinuous Temperature = 24.50C (Temperature > 20.00C) = {true, false} Where to set thethreshold?

Random forestclassifier  Randomforestclassifier,anextensiontobaggingwhichusesde-correlated trees.  Tosayitinsimplewords:Randomforestbuildsmultipledecisiontreesand mergesthemtogethertogetamoreaccurateandstableprediction

Random ForestClassifier TrainingData Mfeatures Nexamples

Random ForestClassifier Create bootstrapsamples from the trainingdata Mfeatures Nexamples ....…

Random ForestClassifier Construct a decisiontree Mfeatures Nexamples ....…

Random ForestClassifier Ateachnodeinchoosingthesplitfeature choose only among m<Mfeatures Mfeatures Nexamples ....…

Random ForestClassifier Create decision tree from each bootstrapsample Mfeatures Nexamples ....… ....…

Random ForestClassifier Mfeatures Nexamples Take the majority vote ....… ....…

Unknown AttributeValues 39 WhatifsomeexampleshavemissingvaluesofA? Usetrainingexampleanywaysortthroughtree IfnodentestsA,assignmostcommonvalueofA amongotherexamplessortedtonoden.  Assign most common value of A among other exampleswithsametargetvalue AssignprobabilitypitoeachpossiblevalueviofA Assign fraction pi of example to each descendant in tree Classifynewexamplesinthesamefashion

Cross-Validation 40  Estimatetheaccuracyofanhypothesisinducedbyasupervised learningalgorithm  Predicttheaccuracyofanhypothesisoverfutureunseen instances  Selecttheoptimalhypothesisfromagivensetofalternative hypotheses  Pruning decisiontrees  Modelselection  Featureselection  Combining multiple classifiers(boosting)

Decision Tree

Decision Tree

Presentation Transcript

Decision tree

Decision tree

Decision Tree

Decision Tree

Decision tree

Decision Tree

DECISION TREE

Decision Tree Modeling

Decision Tree Classifiers

DECISION TREE

Decision Tree

Decision Tree Learning

Decision Tree

Decision Tree

Decision Tree

Decision Tree

Decision Tree