Tree-based Clustering for Gene Expression Data

Tree-based Clustering for Gene Expression Data Baoying Wang and William Perrizo North Dakota State University Fargo, ND 58102

Virtual attractors Local attractors Tree-based Hybrid Clustering • We propose an efficient hybrid clustering method: Clustering using Attractor tree and Merging Process (CAMP). • CAMP consists of two processes: • Clustering using Local Attractor Trees (CLAT) • Cluster Merging Process based on similarity

Cluster Similarity • We consider both relative connectivity and relative closeness between clusters • The cluster similar is defined as hi – the average height of the ith attractor tree fi – the average fan-out of the ith attractor tree d(Ai, Aj) – the distance between two local attractors Ai and Aj.

Ai Aj Av Merging Process

Performance Study • We used three datasets: DS1 and DS2 and DS3. • DS1 contains expression levels of 8,613 human genes measured at 12 time-points. • DS2 is a gene expression matrix of 6221  80. • DS3 is the largest dataset with 13,413 genes under 36 experimental conditions. • Our approach outperforms k-means, BIRCH and CAST when the dataset is large. Run time

Conclusion • CAMP combines the characteristics of both partitioning approach and hierarchical approach and the features of both density-basedapproach and distance-based approach. • Hybrid clustering is much faster than hierarchical clustering, and more flexible than partitioning clustering. • Combination of density-based approach and distance-based approach takes consideration of various kinds of clusters and noisy data.

Tree-based Clustering for Gene Expression Data

Tree-based Clustering for Gene Expression Data

Presentation Transcript

Clustering analysis of microarray gene expression data

Lecture 9: Gene expression analysis/Clustering

Basic Gene Expression Data Analysis--Clustering

Clustering Gene Expression Data

Visualization Approaches for Gene Expression Data

Discrimination and clustering with microarray gene expression data

Context-Specific Bayesian Clustering for Gene Expression Data

Scalable Rule-Based Gene Expression Data Classification

Applications of Visualization and Data Clustering to 3D Gene Expression Data

Principal Component Analysis (PCA) for Clustering Gene Expression Data

Clustering short time series gene expression data

Probabilistic Techniques for the Clustering of Gene Expression Data

Principal Component Analysis (PCA) for Clustering Gene Expression Data

Graph-based consensus clustering for class discovery from gene expression data

Clustering Gene Expression Data

Gene Expression Data

Clustering Short Gene Expression Profiles

Soft clustering of gene expression data

Clustering analysis of microarray gene expression data

Clustering Gene Expression Data

Clustering Gene Expression Data