60 likes | 76 Views
Tree-based Clustering for Gene Expression Data. Baoying Wang and William Perrizo North Dakota State University Fargo, ND 58102. Virtual attractors. Local attractors. Tree-based Hybrid Clustering.
E N D
Tree-based Clustering for Gene Expression Data Baoying Wang and William Perrizo North Dakota State University Fargo, ND 58102
Virtual attractors Local attractors Tree-based Hybrid Clustering • We propose an efficient hybrid clustering method: Clustering using Attractor tree and Merging Process (CAMP). • CAMP consists of two processes: • Clustering using Local Attractor Trees (CLAT) • Cluster Merging Process based on similarity
Cluster Similarity • We consider both relative connectivity and relative closeness between clusters • The cluster similar is defined as hi – the average height of the ith attractor tree fi – the average fan-out of the ith attractor tree d(Ai, Aj) – the distance between two local attractors Ai and Aj.
Ai Aj Av Merging Process
Performance Study • We used three datasets: DS1 and DS2 and DS3. • DS1 contains expression levels of 8,613 human genes measured at 12 time-points. • DS2 is a gene expression matrix of 6221 80. • DS3 is the largest dataset with 13,413 genes under 36 experimental conditions. • Our approach outperforms k-means, BIRCH and CAST when the dataset is large. Run time
Conclusion • CAMP combines the characteristics of both partitioning approach and hierarchical approach and the features of both density-basedapproach and distance-based approach. • Hybrid clustering is much faster than hierarchical clustering, and more flexible than partitioning clustering. • Combination of density-based approach and distance-based approach takes consideration of various kinds of clusters and noisy data.