1 / 19

Bayesian Network Diagnosis Project Analysis

Analyze influence of network and data size on Bayesian network learning. Compare classifier accuracy. Data generation and structural learning using WEKA software. Analyze leukemia gene data for classification. Preprocess gene data, filter genes using mutual information. Goal to submit results by November 17.

jbetty
Download Presentation

Bayesian Network Diagnosis Project Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks October 18, 2005

  2. Goals of the Project • Analysis of the influence of network size and data size on structural learning of Bayesian networks • Six Bayesian networks of various sizes are given. • Generate data examples from each Bayesian network. • Learn Bayesian network structures from the generated data. • Analyze the learned results according to the network size and the data size. • Classification using Bayesian networks • A microarray dataset consisting of two classes of samples is given. • Learn Bayesian network classifiers from the dataset. • Compare the classification accuracy of Bayesian network classifiers with that of other classifiers such as neural networks. (c) 2005 SNU CSE Biointelligence Lab

  3. Given Bayesian Networks • Randomly generated • # of variables: 10, 30, and 45 • All variables are binary • Network file format: *.dsc for MSBNX(http://research.microsoft.com/adapt/MSBNx/) (c) 2005 SNU CSE Biointelligence Lab

  4. Example Bayesian Network Structure I (c) 2005 SNU CSE Biointelligence Lab

  5. Example Bayesian Network Structure II (c) 2005 SNU CSE Biointelligence Lab

  6. *.dsc Files Node name Possible states Parents Child Conditional probability distribution (c) 2005 SNU CSE Biointelligence Lab

  7. Data Generation X1 X2 Sample X1 from P(X1) Sample X2 from P(X2) Sample X3 from P(X3| X1) Sample X4 from P(X4| X1, X2) Sample X5 from P(X5| X3) Sample X6 from P(X6| X4) X3 X4 X5 X6 (c) 2005 SNU CSE Biointelligence Lab

  8. Data Generation Tool • data_generator • Usage: data_generator [network file style] [# of nodes] [# of data samples] [input file] [output file]... (c) 2005 SNU CSE Biointelligence Lab

  9. Structural Learning of Bayesian Networks • Using WEKA software (http://www.cs.waikato.ac.nz/ml/weka/) (c) 2005 SNU CSE Biointelligence Lab

  10. Learning Example Learned network structure The original network structure (c) 2005 SNU CSE Biointelligence Lab

  11. Materials for the First One • Given • Bayesian networks • sf_10.dsc, sf_30.dsc, sf_45.dsc, md_10.dsc, md_30.dsc, md_45.dsc • Data generation tool • data_generator.exe [for Windows], data_generator [for Linux] • Downloadable • MSBNX (http://research.microsoft.com/adapt/MSBNx/) • WEKA (http://www.cs.waikato.ac.nz/ml/weka/) • You should write your own code for comparing Bayesian network structures. (c) 2005 SNU CSE Biointelligence Lab

  12. 60 leukemia patients Bone marrow samples Affymetrix GeneChip arrays Gene expression data Study • Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells, MH Cheok et al., Nature Genetics 35, 2003. (c) 2005 SNU CSE Biointelligence Lab

  13. Gene Expression Data • # of data examples • 120 (60: before treatment, 60: after treatment) • # of genes measured • 12600 (Affymetrix HG-U95A array) • Task • Classification between “before treatment” and “after treatment” based on gene expression pattern (c) 2005 SNU CSE Biointelligence Lab

  14. Affymetrix GeneChip Arrays • Use short oligos to detect gene expression level. • Each gene is probed by a set of short oligos. • Each gene expression level is summarized by • Signal: numerical value describing the abundance of mRNA • A/P call: denotes the statistical significance of signal (c) 2005 SNU CSE Biointelligence Lab

  15. Preprocessing • Remove the genes having more than 60 ‘A’ calls • # of genes: 12600  3190 • Discretization of gene expression level • Criterion: median gene expression value of each sample • 0 (low) and 1 (high) (c) 2005 SNU CSE Biointelligence Lab

  16. Gene Filtering • Using mutual information • Estimated probabilities were used. • # of genes: 3190  1000 • Final dataset • # of attributes: 1001 (one for the class) • Class: 0 (after treatment), 1 (before treatment) • # of data examples: 120 (c) 2005 SNU CSE Biointelligence Lab

  17. Final Dataset 1000 120 (c) 2005 SNU CSE Biointelligence Lab

  18. Materials for the Second One • Given • Preprocessed microarray data file: data2.txt • Downloadable • WEKA (http://www.cs.waikato.ac.nz/ml/weka/) (c) 2005 SNU CSE Biointelligence Lab

  19. Due: November 17, 2005 • Analysis of the influence of network size and data size on structural learning of Bayesian networks • Six Bayesian networks of various sizes are given. • Generate data examples from each Bayesian network. • Learn Bayesian network structures from the generated data. • Analyze the learned results according to the network size and the data size. • Classification using Bayesian networks • A microarray dataset consisting of two classes of samples is given. • Learn Bayesian network classifiers from the dataset. • Compare the classification accuracy of Bayesian network classifiers with that of other classifiers such as neural networks. • Submission • Kyu-BaekHwang (301-419) • 1 hard copy. (c) 2005 SNU CSE Biointelligence Lab

More Related