1 / 6

Tree-based Clustering for Gene Expression Data

Tree-based Clustering for Gene Expression Data. Baoying Wang and William Perrizo North Dakota State University Fargo, ND 58102. Virtual attractors. Local attractors. Tree-based Hybrid Clustering.

jonk
Download Presentation

Tree-based Clustering for Gene Expression Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tree-based Clustering for Gene Expression Data Baoying Wang and William Perrizo North Dakota State University Fargo, ND 58102

  2. Virtual attractors Local attractors Tree-based Hybrid Clustering • We propose an efficient hybrid clustering method: Clustering using Attractor tree and Merging Process (CAMP). • CAMP consists of two processes: • Clustering using Local Attractor Trees (CLAT) • Cluster Merging Process based on similarity

  3. Cluster Similarity • We consider both relative connectivity and relative closeness between clusters • The cluster similar is defined as hi – the average height of the ith attractor tree fi – the average fan-out of the ith attractor tree d(Ai, Aj) – the distance between two local attractors Ai and Aj.

  4. Ai Aj Av Merging Process

  5. Performance Study • We used three datasets: DS1 and DS2 and DS3. • DS1 contains expression levels of 8,613 human genes measured at 12 time-points. • DS2 is a gene expression matrix of 6221  80. • DS3 is the largest dataset with 13,413 genes under 36 experimental conditions. • Our approach outperforms k-means, BIRCH and CAST when the dataset is large. Run time

  6. Conclusion • CAMP combines the characteristics of both partitioning approach and hierarchical approach and the features of both density-basedapproach and distance-based approach. • Hybrid clustering is much faster than hierarchical clustering, and more flexible than partitioning clustering. • Combination of density-based approach and distance-based approach takes consideration of various kinds of clusters and noisy data.

More Related