Generating Robust and Consensus Clusters from Gene Expression Data

Generating Robust and Consensus Clusters from Gene Expression Data Allan Tuckera, Stephen Swifta, Xiaohui Liua, Nigel Martinb, Christine Orengoc, Paul Kellamc a b c

Introduction • Many different clustering algorithms used for gene expression analysis • Little work on inter-method consistency or cross-comparison • Important due to differing results (each algorithm implicitly forces a structure on data) • Obtaining a consensus across methods should improve confidence

The Talk • Compare a number of existing methods for clustering gene expression data • Algorithms for generating robust clusters and consensus clusters • Tested on a set of Amersham Scorecard data with known structure and experimentally obtained virus B-Cell data • Provides specific advantages in the analysis of array based gene expression data

Clustering Methods • Hierarchical Clustering (R) • PAM (R) • CAST (C++) • Simulated Annealing (C++)

Datasets • Amersham Scorecard • 597 genes, 24 blocks with 32 columns and 12 rows under 30 experimental conditions • Repeated experiments which we assume should cluster together • B Cell Data • 1987 genes

Comparison of Methods

The Agreement Matrix

Robust Clustering • Takes agreement matrix as input • Place all genes into robust clusters that have full agreement • Deterministic algorithm • Should give higher degree of confidence in clusters • Not all genes will be assigned

Dataset ASC B-cell No. of Robust Clusters 24 154 % of variables assigned 79% 25% Max. Robust Cluster size 44 14 Min. Robust Cluster size 2 2 Mean Robust Cluster size 10.2 3.2 Robust Clustering

Consensus Clustering • “Full agreement” requirement for robust clusters can be too restrictive • Algorithm for generating consensus clusters given minimum agreement parameter • Approximate stochastic algorithm

Consensus Clustering Input Cluster Results Agreement Matrix Consensus Clusters

Consensus Clustering B-Cell Dataset ASC Dataset

Consensus Clustering

Summary • Clustering biological data is very useful • Biases in clustering algorithms can mean success in identification of patterns vary • Consensus algorithms used in protein secondary structure prediction • We apply similar strategy with robust and consensus clustering

Conclusions • Robust clusters good for identifying common transcriptional modules • Also for identifying genes with common functional pathway • Useful for creating clusters of genes with high confidence • Can be restrictive in discarding genes that do not have full agreement.

Conclusions • Consensus clustering relaxes full agreement requirement • Resembles defined clusters in synthetic data very well • Reliably picks out features in the virus gene expression data • Fulfils desire not to rely on one clustering algorithm during gene expression analysis

Acknowledgements • The Biotechnology and Biological Sciences Research Council (BBSRC), UK • The Engineering and Physical Sciences Research Council (EPSRC), UK

Generating Robust and Consensus Clusters from Gene Expression Data

Generating Robust and Consensus Clusters from Gene Expression Data

Presentation Transcript

Efficient Gene Selection with Rough Sets From Gene Expression Data

Gene Expression: From Gene to Protein

MINING THE GENE EXPRESSION MATRIX: INFERRING GENE RELATIONSHIPS FROM L ARGE SCALE GENE EXPRESSION DATA

Clustering Gene Expression Data

Gene Structure and Gene Expression

Robust diagnosis DLBCL from gene expression data from different laboratories

Gene Expression: From Gene to Protein

Accurate Estimation of Gene Expression Levels from Digital Gene Expression Sequencing Data

Gene Expression Data and Cluster Analysis

Gene Expression From gene to protein

Gene Expression: From Gene to Protein

Inferring Regulatory Networks from Gene Expression Data

Gene Expression: From Gene to Protein

Graph-based consensus clustering for class discovery from gene expression data

Clustering Gene Expression Data

Gene expression From Gene to Protein

Gene Expression Data

Mining Phenotypes and Informative Genes from Gene Expression Data

Mining Association Rules from Microarray Gene Expression Data

Gene expression From Gene to Protein

Clustering Gene Expression Data

Clustering Gene Expression Data