1 / 11

Learning Decision Trees

Learning Decision Trees. Brief tutorial by M Werner. Medical Diagnosis Example. Goal – Diagnose a disease from a blood test Clinical Use Blood sample is obtained from a patient Blood is tested to measure current expression of various proteins, say by using a DNA microarray

olga-sawyer
Download Presentation

Learning Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Decision Trees Brief tutorial by M Werner

  2. Medical Diagnosis Example • Goal – Diagnose a disease from a blood test • Clinical Use • Blood sample is obtained from a patient • Blood is tested to measure current expression of various proteins, say by using a DNA microarray • Data is analyzed to produce a Yes or No answer

  3. Data Analysis • Use a decision tree such as: P1 > K1 Y N P2 > K2 P2 > K2 N Y Y P3 > K3 P4 > K4 P4 > K4 No N Y Y N Y N Yes No Yes No Yes No

  4. How to Build the Decision Tree • Start with samples of blood from patients known to either have the disease or not (training set). • Suppose there are 20 patients and 10 are known to have the disease and 10 not • From the training set get expression levels for all proteins of interest • i.e. if there are 20 patients and 50 proteins we get a 50 X 20 array of real numbers • Rows are proteins • Columns are patients

  5. Choosing the decision nodes 10 have disease 10 don’t • We would like the tree to be as short as possible • Start with all 20 patients in one group • Choose a protein and a level that gains the most information 10/10 Possible splitting condition Mostly diseased Px > Kx 9/3 1/7 Mostly not diseased 10/10 Alternative splitting condition Py > Ky 7/7 3/3

  6. How to determine information gain • Purity – A measure to which the patients in a group share the same outcome. • A group that splits 1/7 is fairly pure – Most patients don’t have the disease • 0/8 is even purer • 4/4 is the opposite of pure. This group is said to have high entropy. Knowing that a patient is in this group does not make her more or less likely to have the disease. • The decision tree should reduce entropy as test conditions are evaluated

  7. Measuring Purity (Entropy) • Let f(i,j)=Prob(Outcome=j in node i) • i.e. If node 2 has a 9/3 split • f(2,0) = 9/12 = .75 • f(2,1) = 3/12 = .25 • Gini impurity: • Entropy:

  8. Computing Entropy

  9. Goal is to use a test which best reduces total entropy in the subgroups

  10. Building the Tree

  11. Links • http://www.ece.msstate.edu/research/isip/publications/courses/ece_8463/lectures/current/lecture_27/lecture_27.pdf • Decision Trees & Data Mining • Andrew Moore Tutorial

More Related