400 likes | 448 Views
The decision tree is the classification algorithm in ML(machine learning). A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.<br><br>To understand the algorithm of the decision tree we need to know about the classification.<br><br>https://www.learnbay.co/data-science-course/blog-post/decision-tree/
E N D
DecisionTree Data Science Program Bangalore
Introduction 2 Givenanevent,predictiscategory. Examples: Whowonagivenballgame? Howshouldwefileagivenemail? What word sense was intended for a given occurrence of aword? Event = list of features. Examples: Ballgame:Whichplayerswereon offense? Email: Who sent theemail? Disambiguation: What was the preceding word?
Introduction 3 Use adecisiontreetopredictcategoriesfornewevents. Usetrainingdatatobuildthedecisiontree. New Events Training Events and Categories Decision Tree Category
Decision Tree forPlayTennis 4 Outlook Sunny Overcast Rain Humidity Each internal node tests anattribute High Normal Each branch corresponds toan attribute valuenode Each leaf node assigns aclassification No Yes
Word SenseDisambiguation 5 Givenanoccurrenceofaword,decidewhichsense,ormeaning,was intended. Example:"run" run1: move swiftly (I ran to thestore.) run2: operate (I run astore.) run3: flow (Water runs from thespring.) run4:lengthoftornstitches(Herstockingshadarun.) etc.
Word SenseDisambiguation 6 Categories Use word sense labels (run1, run2, etc.) to name the possiblecategories. Features Featuresdescribethecontextofthewordwewantto disambiguate. Possiblefeaturesinclude: near(w):isthegivenwordnearanoccurrenceofwordw? pos:theword’spartofspeech left(w):isthewordimmediatelyprecededbythewordw? etc.
Word SenseDisambiguation 7 Example decision tree: pos nounverb near(race) yes no run1 near(river) yes no near(stocking) yes no run4 run3 (Note:DecisiontreesforWSDtendtobequitelarge)
Decision Tree forConjunction 9 Outlook=Sunny Wind=Weak Outlook Sunny Overcast Rain Wind No No Strong Weak No Yes
Decision Tree forDisjunction 10 Outlook=Sunny Wind=Weak Outlook Sunny Overcast Rain Yes Wind Wind Strong Weak Strong Weak No Yes No Yes
Decision Tree forXOR 11 Outlook=Sunny XORWind=Weak Outlook Sunny Overcast Rain Wind Wind Wind Strong Weak Strong Weak Strong Weak Yes No No Yes No Yes
DecisionTree 12 • decision trees represent disjunctions ofconjunctions Outlook Sunny Overcast Rain Humidity Yes Wind Strong Weak High Normal No Yes No Yes (Outlook=Sunny Humidity=Normal) (Outlook=Overcast) (Outlook=Rain Wind=Weak)
When toconsider Decision Trees 13 Instancesdescribablebyattribute-valuepairs Targetfunctionisdiscretevalued Disjunctivehypothesismayberequired Possibly noisy training data Missing attribute values Examples: Medicaldiagnosis Credit riskanalysis Object classification for robot manipulator (Tan1993)
Top-Down Induction ofDecision TreesID3 14 A the “best” decision attribute for nextnode Assign A as decision attribute fornode For each value of A create newdescendant Sort training examples to leaf node according to the attribute value of thebranch If all training examples are perfectly classified (same value of target attribute) stop, else iterate over new leafnodes.
Which attribute isbest? 15 A1=? [29+,35-] A2=? [29+,35-] G H L M [18+,33-] [21+,5-] [8+,30-] [11+,2-]
Entropy 16 Sisasampleoftrainingexamples p+istheproportionofpositiveexamples p- is the proportion of negative examples EntropymeasurestheimpurityofS Entropy(S) = -p+ log2 p+ - p- log2p-
Entropy 17 Entropy(S)= expected number of bits needed to encodeclass (+ or -) of randomly drawn members of S (under the optimal, shortestlength-code) Why? Informationtheoryoptimallengthcodeassign –log2pbitstomessageshavingprobabilityp. Sotheexpectednumberofbitstoencode (+ or -) of random member ofS: -p+ log2 p+ - p- log2p-
Information Gain(S=E) 18 Gain(S,A):expectedreductioninentropydueto sorting S on attributeA Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log235/64 = 0.99 [29+,35-] A1=? A2=? [29+,35-] G H True False [21+,5-] [8+,30-] [18+,33-] [11+,2-]
InformationGain 19 Entropy([21+,5-]) =0.71 Entropy([8+,30-]) =0.74 Gain(S,A1)=Entropy(S) -26/64*Entropy([21+,5-]) -38/64*Entropy([8+,30-]) =0.27 [29+,35-] A1=? Entropy([18+,33-]) =0.94 Entropy([11+,2-]) =0.62 Gain(S,A2)=Entropy(S) -51/64*Entropy([18+,33-]) -13/64*Entropy([11+,2-]) =0.12 A2=? [29+,35-] True False True False [18+,33-] [11+,2-] [21+,5-] [8+,30-]
Selecting the NextAttribute 21 S=[9+,5-] E=0.940 S=[9+,5-] E=0.940 Humidity Wind High Normal Weak Strong [3+,4-] [6+,1-] [6+,2-] [3+,3-] E=0.811 E=1.0 Gain(S,Wind) =0.940-(8/14)*0.811 – (6/14)*1.0 =0.048 E=0.985 E=0.592 Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 Humidity provides greater info. gain than Wind, w.r.t targetclassification.
Selecting the NextAttribute 22 S=[9+,5-] E=0.940 Outlook Over cast Rain Sunny [3+,2-] [2+,3-] [4+,0] E=0.971 E=0.0 E=0.971 Gain(S,Outlook) =0.940-(5/14)*0.971 -(4/14)*0.0 –(5/14)*0.0971 =0.247
Selecting the NextAttribute 23 • The information gain values for the 4 attributes are: • Gain(S,Outlook)=0.247 • Gain(S,Humidity)=0.151 • Gain(S,Wind)=0.048 • Gain(S,Temperature)=0.029 • where S denotes the collection of training examples
ID3Algorithm 24 [D1,D2,…,D14] Outlook [9+,5-] Sunny Overcast Rain Ssunny =[D1,D2,D8,D9,D11] [D3,D7,D12,D13] [D4,D5,D6,D10,D14] [2+,3-] [4+,0-] [3+,2-] Yes ? ? Gain(Ssunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970 Gain(Ssunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570 Gain(Ssunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) =0.019
ID3Algorithm 25 Outlook Sunny Overcast Rain Humidity Yes Wind [D3,D7,D12,D13] Strong Weak High Normal No Yes No Yes [D8,D9,D11] [mistake] [D6,D14] [D4,D5,D10] [D1,D2]
Occam’sRazor 26 ”Iftwotheoriesexplainthefactsequallyweel,thenthe simpler theory is to bepreferred” Arguments infavor: Fewer short hypotheses than long hypotheses Ashorthypothesisthatfitsthedataisunlikelytobe acoincidence Alonghypothesisthatfitsthedatamightbea coincidence Argumentsopposed: Therearemanywaystodefinesmallsetsof hypotheses
Overfitting 27 One of the biggest problems with decision trees isOverfitting
AvoidOverfitting 28 Stopgrowingwhensplitnotstatisticallysignificant Grow full tree, thenpost-prune Select “best”tree: measure performance over trainingdata measureperformanceoverseparatevalidationdataset min(|tree|+|misclassifications(tree)|)
Converting aTree to Rules 30 Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Strong Weak No Yes No Yes R1: If (Outlook=Sunny) (Humidity=High) Then PlayTennis=No R2: If (Outlook=Sunny) (Humidity=Normal) Then PlayTennis=Yes R3: If (Outlook=Overcast) ThenPlayTennis=Yes R4: If(Outlook=Rain)(Wind=Strong) Then PlayTennis=No R5: If(Outlook=Rain)(Wind=Weak) ThenPlayTennis=Yes
Continuous ValuedAttributes 31 Createadiscreteattributetotestcontinuous Temperature = 24.50C (Temperature > 20.00C) = {true, false} Where to set thethreshold?
Random forestclassifier Randomforestclassifier,anextensiontobaggingwhichusesde-correlated trees. Tosayitinsimplewords:Randomforestbuildsmultipledecisiontreesand mergesthemtogethertogetamoreaccurateandstableprediction
Random ForestClassifier TrainingData Mfeatures Nexamples
Random ForestClassifier Create bootstrapsamples from the trainingdata Mfeatures Nexamples ....…
Random ForestClassifier Construct a decisiontree Mfeatures Nexamples ....…
Random ForestClassifier Ateachnodeinchoosingthesplitfeature choose only among m<Mfeatures Mfeatures Nexamples ....…
Random ForestClassifier Create decision tree from each bootstrapsample Mfeatures Nexamples ....… ....…
Random ForestClassifier Mfeatures Nexamples Take the majority vote ....… ....…
Unknown AttributeValues 39 WhatifsomeexampleshavemissingvaluesofA? Usetrainingexampleanywaysortthroughtree IfnodentestsA,assignmostcommonvalueofA amongotherexamplessortedtonoden. Assign most common value of A among other exampleswithsametargetvalue AssignprobabilitypitoeachpossiblevalueviofA Assign fraction pi of example to each descendant in tree Classifynewexamplesinthesamefashion
Cross-Validation 40 Estimatetheaccuracyofanhypothesisinducedbyasupervised learningalgorithm Predicttheaccuracyofanhypothesisoverfutureunseen instances Selecttheoptimalhypothesisfromagivensetofalternative hypotheses Pruning decisiontrees Modelselection Featureselection Combining multiple classifiers(boosting)