Microarray Data Analysis

Microarray Data Analysis Jahangheer Shaik

Summary • Problem Definition • Proposed methods • Summary • Future works

Problem Definition • Given the unique characteristics of microarray data, there are two major areas of research. • Finding Differentially expressed genes • Mostly two-sample/multi-sample experiments • Functionally classifying genes • Two-sample/multi-sample experiments • Time series datasets • This dissertation will focus on two-sample/multi-sample experiments

Objectives • Finding genes that best differentiate different classes of tissues • What gene-gene interactions best differentiate groups of samples • Are subtypes of disease discernible by the genes? • Visualization of high dimensional data • Finding uncharacterized genes with similar pattern to well characterized ones • Gene-gene interactions present among tissue samples

Finding Differentially expressed Genes (DEGs) • Statistics based methods • Conventional methods rely on mean and sigma • Lot of false discovery rate • Do not address the problem of reproducibility • A concept of reproducibility by various bootstrapped datasets is proposed by mukherjee [1]

Finding DEGs • Mean is not a good representative of the samples. Mean is hence supplemented with Hausdorff distance measure. J. Shaik and M. Yeasin, "Adaptive Ranking and Selection of Differentially Expressed Genes fromMicroarray Data," WSEAS transactions on Biology and Biomedicine, vol. 3, pp. 125-133, 2006.

Schematic showing affect of skewness J. Shaik and M. Yeasin, “Ranking Function Based on Higher Order Statistics (RF-HOS) for Two Sample Microarray Experiments”, Accepted for ISBRA07, Atlanta, GA,2007. Schematic showing affect of Kurtosis

Ranking function based on higher order statistics

Clustering based approach • The clustering based approach measures the ability of coexpressed genes to maximally separate the different sample cases

Modules Developed • ASI Clustering Algorithm • Fuzzy ASI Clustering Algorithm • Progressive Framework • Two-way Clustering Framework

Fuzzy-ASI • It is a modification of Hard-ASI algorithm • The memberships are not absolute • The progressive framework offers an interesting choice for fuzzification factor • Computationally more intensive than hard ASI

Progressive framework

Visualization • Heat maps [2] • PCA based methods (Mostly used) • 3D star coordinate projection (proposed in this dissertation)

Results

Visualization

References • S. Mukherjee, S. J. Roberts, and M. J. Laan, "Data-adaptive Test Statistics for Microarray Data,“ Bioinformatics, vol. 21, pp. 108-114, 2005 • U. Alon, N. Barkai, D. A. Notterman, K.Gish, S. Ybarra, D. Mack, and A. J. Levine, "Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays," Proc. Natl. Acad. Sci. USA, vol. 96, pp. 6745-6750, 1999. • S. Raychaudhuri, J. M. Stuart, and R. B. Altman, "Principal components analysis to summarize MicroArray Experiments: Application to Sporulation Time Series," presented at Pacific Symposium on Biocomputing, Honolulu, Hawaii, 2000. • I. Guyon, "An Introduction of Variable and Feature Selection," Journal of Machine Learning Research, vol. 3, pp. 1157-1182, 2003. • J. M. Ray and W. G. Hearl, "Methods for Evaluating Differential Gene Expression in Tissues and Cells," Drug Development, pp. 50-55, 2005.

References • D. Dembele and P. Kastner, "Fuzzy C-Means for Clustering Microarray Data," Bioinformatics, vol. 19, pp. 973-980, 2003. • S. Y. Kim, T. M. Choi, and J. S. Bae, "Fuzzy Types Clustering for Microarray Data," International Journal of Computational Intelligence, vol. 2, pp. 12-15, 2005. • W. Yang, L. Rueda, and A. Ngom, "A Simulated Annealing Approach to Find the Optimal Parameters for Fuzzy Clustering Microarray Data," Chilean Computer Science Society (SCCC 05), pp. 45-55, 2005. • D. L. Davies and D. W. Bouldin, "A Cluster Separation Measure," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 1, pp. 224-227, 1979. • I. Lonnstedt and T. Speed, "Replicated Microarray Data," Statistica Sinica, vol. 12, pp. 31-46, 2002.

ASI F =

Adaptive Subspace Iteration (ASI) Microarray data Partition matrix (Absolute memberships) Subspace structure Projection of data onto subspace Projection of centroid onto subspace

3D star coordinate projection

Microarray Data Analysis