1 / 16

Radosław Wesołowski Tomasz Pękalski , Michal Borkowicz , Maciej Kopaczyński 12-03-2008

Decision Trees. Radosław Wesołowski Tomasz Pękalski , Michal Borkowicz , Maciej Kopaczyński 12-03-2008. What is it anyway? Decision tree T – a tree with a root (in graph theory sense), in which we assign the following meanings to its elements:. inner nodes represent attributes ,

knox
Download Presentation

Radosław Wesołowski Tomasz Pękalski , Michal Borkowicz , Maciej Kopaczyński 12-03-2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Trees Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz , Maciej Kopaczyński12-03-2008

  2. What is it anyway? Decision tree T – a tree with a root (in graph theory sense), in which we assignthe following meanings to its elements: • inner nodes represent attributes, • edges represent values of the attribute, • leafs represent classification decisions. • Using decision tree we can visualize a program with only ‘if-then’ instructions.

  3. Testing functions Let us consider an attribute A (e.g. temperature). Let VA mean the set of all possible values of A (0K up to infinity). Let Rt mean the set of all possible test results (hot, mild, cold). As a testing function we mean a map t: VARt We distinguish two main types of testing functions, depending on the set VA - discrete and continuous.

  4. Quality of a decision tree (Occam's razor): • we prefer small, simple trees, • we want to gain maximum accuracy of classification (training set, test set) • For example: • Q(T) = *size(T) + *accuracy(T)

  5. Optimal tree – we are given: • a training set S, • a testing functions set TEST, • quality criterion Q. • Target: T optimising Q(T). • Fact: usually this is NP-hard problem. • Conclusion: we have to use heuristics.

  6. Building a decision tree: • top_down method: • a. In the beginning the root includes all training examples • b. We divide them recursively, choosing one attribute at a time • - bottom_up: we remove subtrees or edges to gain precision for judging new cases.

  7. Entropy – average bits amount to represent a decision d for a randomly chosen object from a given set S. Why? Because optimal binary representation assigns –log2(p) bits to a decision which probability is p. We have formula: entropy(p1,...pn)= - p1*log2(p1) - ... - pn*log2(pn)

  8. Information gain: gain(.) = info before dividing – info after dividing

  9. Overtraining: We say that a model H overfits if there is a model H’ such that : • training_error(H) < training_error(H’), • testing_error(H) > testing_error(H’). • Avoiding overtraining: • adequate stop criterions, • posprunning, • preprunning.

  10. Some decision trees algorithms: • R1, • ID3 (Interactive dichotomizer version 3), • C4.5 (ID3 + discretization + prunning), • CART (Classification and Regression Trees), • CHAID (CHi-squared Automatic Interaction • Detection).

More Related