1 / 21

Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Sci

Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering National Cheng Kung University Taiwan, R.O.C. August 13, 2001. Outline. Microarray Techniques Goal of Microarray Data Mining

reid
Download Presentation

Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Sci

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering National Cheng Kung University Taiwan, R.O.C. August 13, 2001

  2. Outline • Microarray Techniques • Goal of Microarray Data Mining • Clustering Methods • Efficient Microarray Data Mining • Conclusions

  3. Current Status • Human genome project is at finishing stage, revealing that there are about 30,000 functional genes in a human cell • For more than 90% of the genes, we know little about their real functions

  4. Microarray Techniques • Main Advantage of Microarray Techniques • allow simultaneous studies of the expression of thousands of genes in a single experiment • Microarray Process • Arrayer • Experiments: Hybridization • Image Capturing of Results • Analysis

  5. Goal of Microarray Mining Multi-Conditions Expression Analysis test … … …. B C A gene 0.4 0.9 0 0.5 .. .. 0.8 0.2 0.8 0.3 0.2 .. .. 0.7 0.6 0.2 0 0.7 .. .. 0.3 … … … … … … … 1 2 3 4 .. .. 1000

  6. Goal of Microarray Mining Multi-Conditions Expression Analysis test … … …. B C A gene 0.4 0.9 0 0.5 .. .. 0.8 0.2 0.8 0.3 0.2 .. .. 0.7 0.6 0.2 0 0.7 .. .. 0.3 … … … … … … … 1 2 3 4 .. .. 1000

  7. Sample Clustering Results

  8. Clustering Methods • Types of Clustering Methods • Partitioning:K-Means, K-Medoids, PAM, CLARA … • Hierarchical:HAC、BIRCH、CURE、ROCK • Density-based: CAST, DBSCAN、OPTICS、CLIQUE… • Grid-based:STING、CLIQUE、WaveCluster… • Model-based:COBWEB、SOM、CLASSIT、AutoClass…

  9. Clustering Methods (cont.) Partitioning Hierarchical

  10. Clustering Methods (cont.) Density-based Grid-based

  11. CAST Clustering • Input • S:a symmetic n × nSimilarity Matrix,S(i, j) ∈ [0, 1] • t:Affinity Threshold (0 < t < 1) • Method 1. Choose a seed for generating a new cluster 2. ADD: add qualified items to the cluster 3. REMOVE: remove unqualified items from the stable cluster 4. Repeat Steps 1-3 till no more clusters can be generated

  12. Similarity Measurements:Correlation Coefficients • The most popular correlation coefficient is Pearson correlationcoefficient (1892) • correlation between X={X1, X2, …, Xn} and Y={Y1, Y2, …, Yn}: where

  13. Similarity Measurements:Correlation Coefficients (cont.) • It captures the similarity of the ‘‘shapes’’ of two expression profiles, and ignores differences between their magnitudes.

  14. Problems in Microarray Mining • How to cluster microarray data with the following requirements met simultaneously ? • Efficiency • Accuracy • Automation

  15. Problems in Microarray Mining (cont.) • How to cluster microarray data with the following requirements met simultaneously ? • Efficiency • Accuracy • Automation Good Clustering Methods + Validation Techniques

  16. Efficient Microarray Mining • Improved CAST algorithm for clustering • Hubert’s Γ statistic for validation • Iterative sampled computation for automatic clustering

  17. Reduce the Computation 1. Narrow down the threshold range 2. Split and Conquer: find “nearly-best” result <Example> m = 4 LM: Left Margin RM: Right Margin LM RM threshold 0 100%

  18. Experimental Results • Dataset • Source:Lawrence Berkeley National Lab (LBNL) Michael Eisen's Lab (http://rana.lbl.gov/EisenData.htm) • Microarray expression data of yeast saccharomyces cerevisiae, containing 6221 genes with 80 conditions • Similarity matrix was obtained in advance

  19. Experimental Results (cont.) • Without Range Narrow down • Executions:19 • Execution Time:246 sec • Γ statistic:0.5138 • With Range Narrow down • Executions:13 • Execution Time:27 sec • Γ statistic:0.5137

  20. Experimental Results (cont.)

  21. Conclusions • Microarray data analysis is an emerging field needing support of data mining techniques • Accuracy • Efficiency • Automation

More Related