Radosław Wesołowski Tomasz Pękalski , Michal Borkowicz , Maciej Kopaczyński 12-03-2008

Decision Trees Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz , Maciej Kopaczyński12-03-2008

What is it anyway? Decision tree T – a tree with a root (in graph theory sense), in which we assignthe following meanings to its elements: • inner nodes represent attributes, • edges represent values of the attribute, • leafs represent classification decisions. • Using decision tree we can visualize a program with only ‘if-then’ instructions.

Testing functions Let us consider an attribute A (e.g. temperature). Let VA mean the set of all possible values of A (0K up to infinity). Let Rt mean the set of all possible test results (hot, mild, cold). As a testing function we mean a map t: VARt We distinguish two main types of testing functions, depending on the set VA - discrete and continuous.

Quality of a decision tree (Occam's razor): • we prefer small, simple trees, • we want to gain maximum accuracy of classification (training set, test set) • For example: • Q(T) = *size(T) + *accuracy(T)

Optimal tree – we are given: • a training set S, • a testing functions set TEST, • quality criterion Q. • Target: T optimising Q(T). • Fact: usually this is NP-hard problem. • Conclusion: we have to use heuristics.

Building a decision tree: • top_down method: • a. In the beginning the root includes all training examples • b. We divide them recursively, choosing one attribute at a time • - bottom_up: we remove subtrees or edges to gain precision for judging new cases.

Entropy – average bits amount to represent a decision d for a randomly chosen object from a given set S. Why? Because optimal binary representation assigns –log2(p) bits to a decision which probability is p. We have formula: entropy(p1,...pn)= - p1*log2(p1) - ... - pn*log2(pn)

Information gain: gain(.) = info before dividing – info after dividing

Overtraining: We say that a model H overfits if there is a model H’ such that : • training_error(H) < training_error(H’), • testing_error(H) > testing_error(H’). • Avoiding overtraining: • adequate stop criterions, • posprunning, • preprunning.

Some decision trees algorithms: • R1, • ID3 (Interactive dichotomizer version 3), • C4.5 (ID3 + discretization + prunning), • CART (Classification and Regression Trees), • CHAID (CHi-squared Automatic Interaction • Detection).

Radosław Wesołowski Tomasz Pękalski , Michal Borkowicz , Maciej Kopaczyński 12-03-2008

Radosław Wesołowski Tomasz Pękalski , Michal Borkowicz , Maciej Kopaczyński 12-03-2008

Presentation Transcript

India, Iran and the NPT

Training-less Ontology-based Text Categorization.

Maciej Swat Biocomplexity Institute Indiana University Bloomington, IN 47405 USA

Semantic Web Service Systems