1 / 16

Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA

This project analyzes decision tree learning and its application in classifying gene expression patterns in leukemia patients. The analysis is performed using WEKA, a popular machine learning tool. The project aims to compare the performance of different decision tree algorithms and identify factors that influence their effectiveness.

medina
Download Presentation

Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Intelligence Project #3: Analysis of Decision Tree Learning Using WEKA May 23, 2006

  2. Introduction • Decision tree learning is a method for approximating discrete-valued target function • The learned function is represented by a decision tree • Decision tree can also be re-represented as if-then rules to improve human readability

  3. An Example of Decision Tree

  4. Decision Tree Representation (1/2) • Decision tree classify instances by sorting them down the tree from the root to some leaf node • Node • Specifies test of some attribute • Branch • Corresponds to one of the possible values for this attribute

  5. Each path corresponds to a conjunction of attribute tests (Outlook=sunny, Temperature=Hot, Humidity=high, Wind=Strong)(Outlook=Sunny ∧ Humidity=High) so NO Decision trees represent a disjunction of conjunction of constraints on the attribute values of instances (Outlook=Sunny ∧Humidity=normal) ∨(Outlook=Overcast) ∨(Outlook=Rain ∧Wind=Weak) Decision Tree Representation (2/2) Outlook Sunny Rain Overcast Humidity Yes Wind High Normal Strong Weak No Yes No Yes • What is the merit of tree representation?

  6. Appropriate Problems for Decision Tree Learning • Instances are represented by attribute-value pairs • The target function has discrete output values • Disjunctive descriptions may be required • The training data may contain errors • Both errors in classification of the training examples and errors in the attribute values • The training data may contain missing attribute values • Suitable for classification

  7. 60 leukemia patients Bone marrow samples Affymetrix GeneChip arrays Gene expression data Study • Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells,MH Cheok et al., Nature Genetics 35, 2003.

  8. Gene Expression Data • # of data examples • 120 (60: before treatment, 60: after treatment) • # of genes measured • 12600 (Affymetrix HG-U95A array) • Task • Classification between “before treatment” and “after treatment” based on gene expression pattern

  9. Affymetrix GeneChip Arrays • Use short oligos to detect gene expression level. • Each gene is probed by a set of short oligos. • Each gene expression level is summarized by • Signal: numerical value describing the abundance of mRNA • A/P call: denotes the statistical significance of signal

  10. Preprocessing • Remove the genes having more than 60 ‘A’ calls • # of genes: 12600  3190 • Discretization of gene expression level • Criterion: median gene expression value of each sample • 0 (low) and 1 (high)

  11. Gene Filtering • Using mutual information • Estimated probabilities were used. • # of genes: 3190  1000 • Final dataset • # of attributes: 1001 (one for the class) • Class: 0 (after treatment), 1 (before treatment) • # of data examples: 120

  12. Final Dataset 1000 120

  13. Materials for the Project • Given • Preprocessed microarray data file: data2.txt • Downloadable • WEKA (http://www.cs.waikato.ac.nz/ml/weka/)

  14. Analysis of Decision Tree Learning

  15. Analysis of Decision Tree Learning

  16. Submission • Due date: June 15 (Thu.), 12:00(noon) • Report: Hard copy(301-419) & e-mail. • ID3, J48 and another decision tree algorithm with learning parameter. • Show the experimental results of each algorithm. Except for ID3, you should try to find out better performance, changing learning parameter. • Analyze what makes difference between selected algorithms. • E-mail : jwha@bi.snu.ac.kr

More Related