1 / 76

ML-05-decision-trees

Decision Tree Machine Learning Algorithm

Sadhana5
Download Presentation

ML-05-decision-trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS60050 MachineLearning Decision TreeClassifier SlidestakenfromcoursematerialsofTan,Steinbach,Kumar

  2. IllustratingClassificationTask Learning algorithm Induction Learn Model Model 10 TrainingSet Apply Model Deduction 10 TestSet

  3. Intuitionbehindadecisiontree • Askaseriesofquestionsaboutagivenrecord • Eachquestionisaboutoneoftheattributes • Answertoonequestiondecideswhatquestiontoask next(orifanextquestionisneeded) • Continueaskingquestionsuntilwecaninfertheclass ofthegivenrecord

  4. ExampleofaDecisionTree SplittingAttributes Refund Yes No MarSt NO Married Single,Divorced TaxInc NO <80K >80K YES NO 10 Model: DecisionTree TrainingData

  5. Structureofadecisiontree • Decisiontree:hierarchicalstructure • Onerootnode:noincomingedge,zeroor more outgoingedges • Internalnodes:exactlyoneincomingedge,twoor moreoutgoingedges • Leaforterminal nodes:exactlyoneincomingedge,no outgoingedge • Eachleafnodeassignedaclasslabel • Eachnon-leafnodecontainsatestconditionon oneoftheattributes

  6. ApplyingaDecisionTree Classifier Tree Induction algorithm Induction Learn Model Model 10 TrainingSet Apply Model Decision Tree Deduction 10 TestSet

  7. ApplyModeltoTestData TestData Startfromtherootoftree. Refund 10 Yes No MarSt NO Married Single,Divorced Onceadecisiontree hasbeenconstructed (learned),itiseasyto applyittotestdata TaxInc NO <80K >80K YES NO

  8. ApplyModeltoTestData TestData Refund 10 Yes No MarSt NO Married Single,Divorced TaxInc NO <80K >80K YES NO

  9. ApplyModeltoTestData TestData Refund 10 Yes No MarSt NO Married Single,Divorced TaxInc NO <80K >80K YES NO

  10. ApplyModeltoTestData TestData Refund 10 Yes No MarSt NO Married Single,Divorced TaxInc NO <80K >80K YES NO

  11. ApplyModeltoTestData TestData Refund 10 Yes No MarSt NO Married Single,Divorced TaxInc NO <80K >80K YES NO

  12. ApplyModeltoTestData TestData Refund 10 Yes No MarSt NO AssignCheatto“No” Married Single,Divorced TaxInc NO <80K >80K YES NO

  13. LearningaDecisionTreeClassifier Tree Induction algorithm Induction Learn Model Model 10 TrainingSet Apply Model Decision Tree Deduction Howtolearnadecisiontree? 10 TestSet

  14. ADecisionTree(seenearlier) SplittingAttributes Refund Yes No MarSt NO Married Single,Divorced TaxInc NO <80K >80K YES NO 10 Model: DecisionTree TrainingData

  15. AnotherDecision Treeonsamedataset Single, Divorced MarSt Married Refund NO No Yes TaxInc NO <80K >80K YES NO Therecouldbemorethanonetree thatfitsthesamedata! 10

  16. Challengeinlearningdecisiontree • Exponentiallymanydecisiontreescanbe constructedfromagivensetofattributes • Someofthetreesaremore‘accurate’orbetter classifiersthantheothers • Findingtheoptimaltreeiscomputationallyinfeasible • Efficientalgorithmsavailabletolearna • reasonablyaccurate(althoughpotentially • suboptimal)decisiontreeinreasonabletime • Employsgreedystrategy • Locallyoptimalchoicesaboutwhichattributetouse nexttopartitionthedata

  17. DecisionTreeInduction • ManyAlgorithms: • Hunt’sAlgorithm(oneoftheearliest) • CART • ID3,C4.5 • SLIQ,SPRINT

  18. GeneralStructureofHunt’sAlgorithm • LetDtbe the setoftrainingrecords thatreachanodet • GeneralProcedure: • –IfDtcontainsrecordsthatall belongthesameclassy,thent • t • isaleafnodelabeledasyt • IfDtisanemptyset,thentisa leafnodelabeledbythedefault classyd • IfDtcontainsrecordsthat • belongtomorethanoneclass, useanattributetesttosplitthe dataintosmallersubsets. • Recursively applythe • proceduretoeachsubset. 10 Dt ?

  19. Hunt’sAlgorithm 10 Defaultclassis“Don’t cheat”sinceitisthe majorityclassinthe dataset

  20. Hunt’sAlgorithm 10 Fornow,assumethat “Refund”hasbeen decidedtobethebest attributeforsplittingin someway(tobe discussedsoon)

  21. 10 Don’t Cheat Marital Status Single, Divorced Married Cheat Don’t Cheat

  22. Hunt’sAlgorithm Refund Don’t Cheat Yes No Don’t Cheat Don’t Cheat Refund Refund Yes No Yes No 10 Don’t Cheat Marital Status Don’t Cheat Marital Status Single, Divorced Single, Divorced Married Married Don’t Cheat Taxable Income Cheat Don’t Cheat <80K >=80K Don’t Cheat Cheat

  23. TreeInduction • Greedystrategy • Splittherecordsbasedonanattributetest thatoptimizescertaincriterion • Issues • Determinehowtosplittherecords • Howtospecifytheattributetestcondition? • Howtodeterminethebestsplit? • Determinewhentostopsplitting

  24. TreeInduction • Greedystrategy • Splittherecordsbasedonanattributetest thatoptimizescertaincriterion • Issues • Determinehowtosplittherecords • Howtospecifytheattributetestcondition? • Howtodeterminethebestsplit? • Determinewhentostopsplitting

  25. HowtoSpecifyTestCondition? • Dependsonattributetypes • Nominal:twoormoredistinctvalues (special case:binary)E.g.,maritalstatus:{single, divorced,married} • Ordinal:twoormoredistinctvaluesthathave anordering.E.g.shirtsize:{S,M,L,XL} • Continuous:continuousrangeofvalues • Dependsonnumberofwaystosplit • 2-waysplit • Multi-waysplit

  26. SplittingBasedonNominalAttributes • Multi-waysplit:Useasmany partitionsasdistinct values. • CarType • FamilyLuxury • Sports • Binarysplit:Dividesvaluesintotwosubsets. • Needtofindoptimalpartitioning. CarTypeCarType OR {Sports, Luxury} {Family, Luxury} {Family} {Sports}

  27. SplittingBasedonOrdinalAttributes • Multi-waysplit:Useasmany partitionsasdistinct values. • Size • SmallLarge • Medium • Binarysplit:Dividesvaluesintotwosubsets. • Needtofindoptimalpartitioning. SizeSize OR {Small, Medium} {Medium, Large} {Large} {Small} Size {Small, Large} • Whataboutthissplit? {Medium}

  28. SplittingBasedonContinuousAttributes • Differentwaysofhandling • Discretizationtoformanordinalcategorical attribute • Static–discretizeonceatthebeginning • Dynamic–rangescanbefoundbyequalinterval bucketing,equalfrequencybucketing (percentiles),or clustering. • BinaryDecision:(A<v)or(Av) • considerallpossiblesplitsandfindsthebestcut • canbemorecomputeintensive

  29. SplittingBasedonContinuousAttributes Taxable Income >80K? Taxable Income? <10K >80K Yes No [10K,25K) [25K,50K) [50K,80K) (i)Binarysplit (ii)Multi-waysplit

  30. TreeInduction • Greedystrategy. • Splittherecordsbasedonanattributetest thatoptimizescertaincriterion. • Issues • Determinehowtosplittherecords • Howtospecifytheattributetestcondition? • Howtodeterminethebestsplit? • Determinewhentostopsplitting

  31. Whatismeantby“determinebestsplit” BeforeSplitting:10recordsofclass0, 10recordsofclass1 Own Car? Car Type? Student ID? Family Sports Luxury c1 No Yes c20 c10c11 C0:6 C1:4 C0:4 C1:6 C0:1 C1:3 C0:8 C1:0 C0:1 C1:7 C0:1 C1:0 C0:1 C1:0 C0:0 C1:1 ... ... C0:0 C1:1 Whichtestconditionisthebest?

  32. HowtodeterminetheBestSplit • Greedyapproach: • –Nodeswithhomogeneousclassdistribution arepreferred • Needameasureofnodeimpurity: C0:5 C1:5 C0:9 C1:1 Non-homogeneous, Highdegreeofimpurity Homogeneous, Lowdegreeofimpurity

  33. MeasuresofNodeImpurity • GiniIndex • Entropy • Misclassificationerror

  34. HowtoFindtheBestSplit BeforeSplitting: M0 A? B? Yes No Yes No NodeN1 NodeN2 NodeN3 NodeN4 M2 M3 M4 M1 M12 M34 Gain=M0–M12vs M0–M34

  35. MeasuresofNodeImpurity • GiniIndex • Entropy • Misclassificationerror

  36. MeasureofImpurity:GINIIndex • GiniIndexforagivennodet: • GINI(t)1[p(j|t)]2 j p(j|t)istherelativefrequencyofclassjatnodet

  37. ExamplesforcomputingGINI GINI(t)1[p(j|t)]2 j P(C1)=0/6=0 P(C2)=6/6=1 Gini=1–P(C1)2–P(C2)2=1–0–1=0 P(C1)=1/6 P(C2)=5/6 Gini= 1–(1/6)2–(5/6)2=0.278 P(C1)=2/6 P(C2)=4/6 Gini= 1–(2/6)2–(4/6)2=0.444

  38. MeasureofImpurity:GINIIndex • GiniIndexforagivennodet: • GINI(t)1[p(j|t)]2 • j • p(j|t)istherelativefrequencyofclassjatnodet • Maximum(1-1/nc)whenrecordsareequally distributedamongallclasses,implyingleast • interestinginformation[nc:numberofclasses] • Minimum(0.0)whenallrecordsbelongtooneclass, implyingmostinterestinginformation

  39. SplittingBasedonGINI • UsedinCART,SLIQ,SPRINT. • Whenanodepissplitintokpartitions(children),the qualityofsplitiscomputedas, n k GINIsplit iGINI(i) n i1 ni=numberofrecordsatchildi, n =numberofrecordsatnodep. where,

  40. BinaryAttributes:ComputingGINIIndex • Splitsintotwopartitions • EffectofWeighingpartitions: • –LargerandPurerPartitionsaresoughtfor. • B? Yes No NodeN1 NodeN2 Gini(N1) =1–(5/7)2–(2/7)2 =0.408 Gini(N2) =1–(1/5)2–(4/5)2 =0.32 Gini(Children) =7/12*0.408+ 5/12*0.32 =0.371

  41. CategoricalAttributes:ComputingGiniIndex • Foreachdistinctvalue,gathercountsforeachclassin thedataset • Usethecountmatrixtomakedecisions Multi-waysplit Two-waysplit (findbestpartitionofvalues)

  42. ContinuousAttributes:ComputingGiniIndex • UseBinaryDecisionsbasedonone value • SeveralChoicesforthesplittingvalue–Numberofpossiblesplittingvalues • =Numberofdistinctvalues • Eachsplittingvaluehasacountmatrix associatedwithit • –Classcountsineachofthe partitions,A<vandAv • Simplemethodtochoosebestv • Foreachv,scanthedatabaseto gathercountmatrixandcompute itsGiniindex • ComputationallyInefficient! • Repetitionofwork. 10 Taxable Income >80K? Yes No

  43. ContinuousAttributes:ComputingGiniIndex... • Forefficientcomputation:foreachattribute, • Sorttheattributeonvalues • Linearlyscanthesevalues,eachtimeupdatingthecountmatrix andcomputingginiindex • Choosethesplitpositionthathastheleastginiindex SortedValues SplitPosition

  44. MeasuresofNodeImpurity • GiniIndex • Entropy • Misclassificationerror

  45. AlternativeSplittingCriteriabasedonINFO • Entropyatagivennodet: • Entropy(t)p(j|t)logp(j|t) 2 j • p(j|t)istherelativefrequencyofclassjatnodet • Measureshomogeneityofanode

  46. ExamplesforcomputingEntropy Entropy(t)p(j|t)logp(j|t) 2 j P(C1)=0/6=0 P(C2)=6/6=1 Entropy=–0log0–1log1=–0–0=0 P(C1)=1/6 P(C2)=5/6 Entropy=–(1/6)log2(1/6)–(5/6)log2(1/6)=0.65 P(C1)=2/6 P(C2)=4/6 Entropy=–(2/6)log2(2/6)–(4/6)log2(4/6)=0.92

  47. AlternativeSplittingCriteriabasedonINFO • Entropyatagivennodet: • Entropy(t)p(j|t)logp(j|t) 2 j • p(j|t)istherelativefrequencyofclassjatnodet • Measureshomogeneityofanode • Maximum(lognc)whenrecordsareequallydistributed amongallclassesimplyingleastinformation • Minimum(0.0)whenallrecordsbelongtooneclass, implyingmostinformation

  48. SplittingBasedonINFO... • InformationGain: n i   GAIN split Entropy(p) Entropy(i) k    n i1  • ParentNodepissplitintokpartitions; niisnumberofrecordsinpartitioni • MeasuresReductioninEntropyachievedbecauseof thesplit.Choosethesplitthatachievesmostreduction (maximizesGAIN) • UsedinID3andC4.5 • Disadvantage:Tendstoprefersplitsthatresultinlarge numberofpartitions,eachbeingsmallbutpure.

  49. SplittingBasedonINFO... • GainRatio: GAIN Split SplitINFO nn GainRATIO k SplitINFO i1 log i i nn split • ParentNode,pissplitintokpartitions niisthenumberofrecordsinpartitioni • AdjustsInformationGainbytheentropyofthe partitioning(SplitINFO).Higherentropypartitioning (largenumberofsmallpartitions)ispenalized! • UsedinC4.5 • DesignedtoovercomethedisadvantageofInformation Gain

  50. MeasuresofNodeImpurity • GiniIndex • Entropy • Misclassificationerror

More Related