1 / 27

Gibbs biclustering of microarray data

Learn about biclustering in microarray data analysis to group genes with similar behavior under specific conditions for more accurate gene expression understanding.

Download Presentation

Gibbs biclustering of microarray data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gibbs biclustering of microarray data Yves Moreau

  2. From genome projects to transcriptome projects • Microarray cost per expression measurement  • Budgets and expertise  • Publicly available microarray data  • Need for exchange standards & repositories • Big consortia set up big microarray projects • Genome projects  “transcriptome” projects (= compendia) • Change in microarray projects ( sequence analysis) • Analyze public data first to generate an hypothesis • Design and perform your own microarray experiment CBS Microarray Course

  3. Why biclustering? • Data becomes more heterogeneous • Gene clustering • Group genes that behave similarly over all conditions • Gene biclustering • Group genes that behave similarlyover a subset of conditions • “Feature selection” • More suitable for heterogeneous compendium CBS Microarray Course

  4. Discretized microarray data set Discretizing microarray data Microarray data is continuous Discretize by equal frequency High Medium Low Distribution of expression values for a given gene Bicluster genes conditions CBS Microarray Course

  5. Bicluster CBS Microarray Course

  6. 1 0 Pattern Background Likelihood CBS Microarray Course

  7. 1 0 Likelihood   .9.9.9.9.9   .9.05.9.9.9    .9.9.9.9.9 .05.9.9.9.9    .9.9.9.9.05      CBS Microarray Course

  8. 1 0 Likelihood Get the right genes    .9.05.05.05.9     .05.9.9.05.05    .05.05.05.05.05    .05.05.9.9.05    CBS Microarray Course

  9. 1 Likelihood 0 Get the right conditions   .9.9.05.05.9   .9.05.05.9.9    .9.9 .05 .05.9 .05.9.05 .05.9    .9.9 .05 .05.05      CBS Microarray Course

  10. 1 Likelihood 0 Get the right frequency pattern   .6.6.2.2.6   .6.2.2.2.6    .6.6.2.2.6 .2.6.2.2.6    .2.6.2.2.2      CBS Microarray Course

  11. Optimizing the bicluster • Find the right bicluster • Genes • Conditions • Pattern • For a given choice of genes and conditions, the “best” pattern is given by the frequencies found in the extracted pattern • No more need to optimize over the pattern • Maximum likelihood: find genes and conditions that maximize • Gibbs sampling: find genes and conditions that optimize CBS Microarray Course

  12. Gibbs sampling Current configuration Next gene configuration CBS Microarray Course

  13. Updated gene configuration Next complete configuration  iterate many times CBS Microarray Course

  14. Gibbs biclustering CBS Microarray Course

  15. Simulated data CBS Microarray Course

  16. Remarks • Gibbs biclustering allows noisy patterns • Optimized configuration is obtained by averaging successive iterated configurations • Biclustering is oriented • Find subset of samples for which a subset of genes is consistenly expressed across genes • Find subset of genes that are consistently expressed across a subset of samples • Searching for multiple patterns • For gene biclustering, remove the data of the genes from the current bicluster • Search for a new pattern • Stop if only empty pattern repeatedly found CBS Microarray Course

  17. Multiple biclusters CBS Microarray Course

  18. Leukemia fingerprints CBS Microarray Course

  19. Mixed-Lineage Leukemia • Armstrong et al., Nature Genetics, 2002 • Mixed-Lineage Leukemia (MLL) is a subtype of ALL • Caused by chromosomal rearrangement in MLL gene • Poorer prognosis than ALL • Microarray analysis shows that MLL is distinct from ALL • FLT3 tyrosine kinase distinguishes most strongly between MLL, ALL, and AML • Candidate drug target CBS Microarray Course

  20. PCA Features CBS Microarray Course

  21. Biclustering leukemia data • Bicluster patients • Find patients for which a subset of genes has a consistent expression profile across this group of patients • Discovery set • 21 ALL, 17 MLL, 25 AML • Validation set • 3 ALL, 3 MLL, 3 AML CBS Microarray Course

  22. Discovering ALL • Bicluster 1: 18 out of 21 ALL patients CBS Microarray Course

  23. Discovering MLL • Bicluster 2: 14 out of 17 MLL patients CBS Microarray Course

  24. Discovering AML • Bicluster 3: 19 out of 25 AML patients CBS Microarray Course

  25. Rescoring ALL CBS Microarray Course

  26. Rescoring MLL CBS Microarray Course

  27. Rescoring AML CBS Microarray Course

More Related