1 / 21

Microarray Data Analysis

Microarray Data Analysis. Jahangheer Shaik. Summary. Problem Definition Proposed methods Summary Future works. Problem Definition. Given the unique characteristics of microarray data, there are two major areas of research. Finding Differentially expressed genes

taran
Download Presentation

Microarray Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarray Data Analysis Jahangheer Shaik

  2. Summary • Problem Definition • Proposed methods • Summary • Future works

  3. Problem Definition • Given the unique characteristics of microarray data, there are two major areas of research. • Finding Differentially expressed genes • Mostly two-sample/multi-sample experiments • Functionally classifying genes • Two-sample/multi-sample experiments • Time series datasets • This dissertation will focus on two-sample/multi-sample experiments

  4. Objectives • Finding genes that best differentiate different classes of tissues • What gene-gene interactions best differentiate groups of samples • Are subtypes of disease discernible by the genes? • Visualization of high dimensional data • Finding uncharacterized genes with similar pattern to well characterized ones • Gene-gene interactions present among tissue samples

  5. Finding Differentially expressed Genes (DEGs) • Statistics based methods • Conventional methods rely on mean and sigma • Lot of false discovery rate • Do not address the problem of reproducibility • A concept of reproducibility by various bootstrapped datasets is proposed by mukherjee [1]

  6. Finding DEGs • Mean is not a good representative of the samples. Mean is hence supplemented with Hausdorff distance measure. J. Shaik and M. Yeasin, "Adaptive Ranking and Selection of Differentially Expressed Genes fromMicroarray Data," WSEAS transactions on Biology and Biomedicine, vol. 3, pp. 125-133, 2006.

  7. Schematic showing affect of skewness J. Shaik and M. Yeasin, “Ranking Function Based on Higher Order Statistics (RF-HOS) for Two Sample Microarray Experiments”, Accepted for ISBRA07, Atlanta, GA,2007. Schematic showing affect of Kurtosis

  8. Ranking function based on higher order statistics

  9. Clustering based approach • The clustering based approach measures the ability of coexpressed genes to maximally separate the different sample cases

  10. Modules Developed • ASI Clustering Algorithm • Fuzzy ASI Clustering Algorithm • Progressive Framework • Two-way Clustering Framework

  11. Fuzzy-ASI • It is a modification of Hard-ASI algorithm • The memberships are not absolute • The progressive framework offers an interesting choice for fuzzification factor • Computationally more intensive than hard ASI

  12. Progressive framework

  13. Visualization • Heat maps [2] • PCA based methods (Mostly used) • 3D star coordinate projection (proposed in this dissertation)

  14. Results

  15. Visualization

  16. References • S. Mukherjee, S. J. Roberts, and M. J. Laan, "Data-adaptive Test Statistics for Microarray Data,“ Bioinformatics, vol. 21, pp. 108-114, 2005 • U. Alon, N. Barkai, D. A. Notterman, K.Gish, S. Ybarra, D. Mack, and A. J. Levine, "Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays," Proc. Natl. Acad. Sci. USA, vol. 96, pp. 6745-6750, 1999. • S. Raychaudhuri, J. M. Stuart, and R. B. Altman, "Principal components analysis to summarize MicroArray Experiments: Application to Sporulation Time Series," presented at Pacific Symposium on Biocomputing, Honolulu, Hawaii, 2000. • I. Guyon, "An Introduction of Variable and Feature Selection," Journal of Machine Learning Research, vol. 3, pp. 1157-1182, 2003. • J. M. Ray and W. G. Hearl, "Methods for Evaluating Differential Gene Expression in Tissues and Cells," Drug Development, pp. 50-55, 2005.

  17. References • D. Dembele and P. Kastner, "Fuzzy C-Means for Clustering Microarray Data," Bioinformatics, vol. 19, pp. 973-980, 2003. • S. Y. Kim, T. M. Choi, and J. S. Bae, "Fuzzy Types Clustering for Microarray Data," International Journal of Computational Intelligence, vol. 2, pp. 12-15, 2005. • W. Yang, L. Rueda, and A. Ngom, "A Simulated Annealing Approach to Find the Optimal Parameters for Fuzzy Clustering Microarray Data," Chilean Computer Science Society (SCCC 05), pp. 45-55, 2005. • D. L. Davies and D. W. Bouldin, "A Cluster Separation Measure," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 1, pp. 224-227, 1979. • I. Lonnstedt and T. Speed, "Replicated Microarray Data," Statistica Sinica, vol. 12, pp. 31-46, 2002.

  18. ASI F =

  19. Adaptive Subspace Iteration (ASI) Microarray data Partition matrix (Absolute memberships) Subspace structure Projection of data onto subspace Projection of centroid onto subspace

  20. 3D star coordinate projection

More Related