1 / 2

Assume that a decision tree has been constructed from training data, and it includes

Assume that a decision tree has been constructed from training data, and it includes a node that tests on V at the frontier of the tree, with it left child yielding a prediction of class C1 (because the only two training data there are C1), and the right child predicting

chinara
Download Presentation

Assume that a decision tree has been constructed from training data, and it includes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assume that a decision tree has been constructed from training data, and it includes a node that tests on V at the frontier of the tree, with it left child yielding a prediction of class C1 (because the only two training data there are C1), and the right child predicting C2 (because the only training datum there is C2). The situation is illustrated here: Suppose that during subsequent use, it is found that i) a large # of items (N > 1000) are classified to the node (with the test on V to the right) ii) 50% of these have V= -1 and 50% of these have V = 1 iii) post classification analysis shows that of the N items reaching the node during usage, 67% were C1 and 33% were C2 iv) of the 0.5 * N items that went to the left leaf during usage, 67% were C1 and 33% were C2 v) of the 0.5 * N items that went to the right leaf during usage, 67% were also C1 and 33% were C2 V -1 1 C1 (2) C2 (0) C1 (0) C2 (1) a) What was the error rate on the sample of N items that went to the sub-tree shown above?

  2. pruned to b) What would the error rate on the same sample of N items have been if the sub-tree on previous page (and reproduced here) had been pruned to not include the final test on V, but to rather be a leaf that predicted C1? V C1 (2) C2 (1) -1 1 C1 (2) C2 (0) C1 (0) C2 (1) c) Suppose that the decision tree were being used to predict whether a pixel in a brain image represented part of tumor tissue or part of healthy brain tissue, for purposes of delineating the portions of the brain that would be removed through surgery (e.g., any tissue predicted to be tumor would be removed, and any predicted to be healthy would be remain). Explain the implications of a decision tree that is overfitting in this scenario?

More Related