1 / 34

Microarrays and Cancer Segal et al.

Microarrays and Cancer Segal et al. . CS 466 Saurabh Sinha. Genomics and pathology. Genomics provides high-throughput measurements of molecular mechanisms Microarrays, ChIP-on-chip, etc. Genomics may provide the molecular underpinnings of pathology, in a highly comprehensive manner

keon
Download Presentation

Microarrays and Cancer Segal et al.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarrays and CancerSegal et al. CS 466 Saurabh Sinha

  2. Genomics and pathology • Genomics provides high-throughput measurements of molecular mechanisms • Microarrays, ChIP-on-chip, etc. • Genomics may provide the molecular underpinnings of pathology, in a highly comprehensive manner • Revolutionize the diagnosis and management of diseases, including cancer

  3. Prior applications to cancer • Gene expression measurements have been applied to cancer diagnosis • Measure each gene’s expression in several normal tissue samples, and several pathological (diseased) samples • Find subset of genes differentially expressed in the two sample groups • If such “gene signatures” of particular cancer types are found, they can become the basis of tests for malignancy

  4. We want better … • Genes may be differentially expressed, but not enough to cross certain thresholds used in the analysis • Analyzing the data on a gene-by-gene basis is error prone -- microarray data has inherent noise • Finding the genes involved in one type of cancer is only the first step; it does not reveal the underlying processes

  5. Part 1: Cancer modules

  6. A “module” level view • Many methods use “gene modules” (sets of genes) as basic blocks for analysis • Instead of trying to find changes in individual gene expression profiles, look out for entire sets of genes with changing expression profiles

  7. The study of Mootha et al. • Showed that expression of “oxidative phosphorylation” genes (a particular set of genes) is reduced in diabetic muscle • Signal not very strong when looking at individual genes, but highly significant when looking at the “gene module”

  8. Source: Nature Genetics 37, S38 - S45 (2005) Disease tissue (Diabetes mellitus type 2) Grey: all genes Red: oxidative phosphorylation genes Normal tissue (Normal tolerance to glucose)

  9. Segal et al.: Methodology • Compile a large collection of cancer-related microarrays • microarrays measuring gene expression in cancer tissues or normal tissue • Compile a large collection of gene sets (modules) from earlier studies • Identify gene set (modules) induced or repressed in a microarray • Identify modules induced in several arrays, or repressed in several arrays • Check if these arrays are enriched in some clinical annotation

  10. Identify gene set (modules) induced or repressed in a microarray • Given expression value Eg,mof each gene g in the microarray experiment m • Compute average expression Egof the gene g over all microarrays • If Eg,mis 2-fold greater than Eg, call the gene g as induced in array m • Categorize each gene as being induced or not-induced in the array. Source: Nature Genetics 36, 1090 - 1098 (2004)

  11. Identify gene set (modules) induced or repressed in a microarray All genes • |All genes| = N • |Module| = n • |Induced| = m • |Intersection| = k • Hypergeometric test(N,n,m,k): • If a set of m genes was chosen at random (sampling w/o replacement), what is the probability that the intersection would be larger than or equal to k? Induced Module Intersection

  12. Identify gene set (modules) induced or repressed in a microarray All genes • |All genes| = N • |Module| = n • |Induced| = m • |Intersection| = k • Hypergeometric test(N,n,m,k): • Sum over i>=k: If a set of m genes was chosen at random (sampling w/o replacement), what is the probability that the intersection would be equal to i? Induced Module Intersection

  13. “p-value” of the Hypergeometric test Identify gene set (modules) induced or repressed in a microarray All genes • |All genes| = N • |Module| = n • |Induced| = m • |Intersection| = k • Hypergeometric test(N,n,m,k): Induced Module Intersection

  14. Identify gene set (modules) induced or repressed in a microarray All genes • |All genes| = N • |Module| = n • |Induced| = m • |Intersection| = k • Hypergeometric test(N,n,m,k) • If the “p-value” is very small, then we infer that the intersection is “statistically significant”, i.e., the module is induced in the microarray • Similarly define module repressed in microarray Induced Module Intersection

  15. Segal et al.: Methodology • Compile a large collection of cancer-related microarrays • microarrays measuring gene expression in cancer tissues or normal tissue • Compile a large collection of gene sets (modules) from earlier studies • Identify gene set (modules) induced or repressed in a microarray • Identify modules induced in several arrays, or repressed in several arrays • Check if these arrays are enriched in some clinical annotation

  16. Source: Nature Genetics 36, 1090 - 1098 (2004)

  17. Segal et al.: Methodology • Compile a large collection of cancer-related microarrays • microarrays measuring gene expression in cancer tissues or normal tissue • Compile a large collection of gene sets (modules) from earlier studies • Identify gene set (modules) induced or repressed in a microarray • Identify modules induced in several arrays, or repressed in several arrays • Check if these arrays are enriched in some clinical annotation

  18. Source: Nature Genetics 36, 1090 - 1098 (2004) Identify modules induced in several arrays, or repressed in several arrays

  19. Segal et al.: Methodology • Compile a large collection of cancer-related microarrays • microarrays measuring gene expression in cancer tissues or normal tissue • Compile a large collection of gene sets (modules) from earlier studies • Identify gene set (modules) induced or repressed in a microarray • Identify modules induced in several arrays, or repressed in several arrays • Check if these arrays are enriched in some clinical annotation

  20. Source: Nature Genetics 36, 1090 - 1098 (2004) Check if these arrays are enriched in some clinical annotation

  21. Segal et al: Cancer “module maps” Red(m,c): Microarrays in which module m was overexpressed (induced) are enriched in condition c Green: Microarrays in which module m was underexpressed (repressed) are enriched in condition c Rows and columns are not in an arbitrary order. They have been “clustered” to display similar rows (or columns) together Source: Nature Genetics 37, S38 - S45 (2005)

  22. Insights from cancer module map • Some modules activated or repressed across many tumor types. Such modules could be related to general tumorogenic processes • Some modules specifically activated or repressed in certain tumor types or stages of tumor progression

  23. From modules to regulation • A module map shows the transcriptional changes underlying cancer • Transcriptional changes are a result of transcription factors and their binding sites • A deeper understanding of cancer would come from finding out which transcription factors and binding sites led to the transcriptional changes

  24. Part 2: Cis-regulatory elements

  25. Genomics and gene regulation • Such knowledge comes from genomics data • ChIP-chip studies identify which transcription factors bind which DNA sequences • Analysis of DNA sequence, using known binding site motifs, gives us putative binding sites • Cross-species conservation also tells us something about possible locations of binding sites

  26. Cis-regulatory analysis • Identify a set of genes whose promoters contain the same binding sites • Such a set of genes is likely to be regulated by the same TF • Often called a “regulatory module” • Earlier studies mined microarrays for “co-expressed” genes, then used motif finding algorithms to discover their shared binding sites

  27. Cis-regulatory analysis • Another approach (Segal et al. 2003) tried to solve the problem in an integrated manner • Find a set of genes such that • their expression profiles are similar (microarrays) • they share the same binding sites (sequence) • Joint learning of “regulatory module” from two very different types of data: microarray and sequence • An important theme in current bioinformatics

  28. Cis-regulatory analysis • Connection between gene expression and cis-regulatory elements (binding sites) also explored in Beer & Tavazoie. • Found rules on combinations and locations of binding sites that would cause the gene to be over- or under-expressed

  29. Source: Nature Genetics 37, S38 - S45 (2005) • The binding sites “RRPE” and “PAC” must occur within • 240 bp and 140 bp of gene start • Genes containing both motifs, following certain rules on location, are tightly co-regulated • Genes containing any one motif, or both in incorrect positional configuration, have close to random expression

  30. Eukaryotes • These studies have mostly focused on yeast (which is a eukaryote, but has a small, compact genome) • Not much work of this type in the longer, more complex genomes of metazoans (e.g., humans, rodents, fruitflies) • The genome is not compact; may not suffice to look at sequence right next to a gene. Intergenic regions are long, and cis-regulatory signals may not be close to gene

  31. One study in humans • HeLa cells are an “immortal” cell-line derived from cervical cancer cells in a person who died in 1951. • Used extensively in studying cancer • Method of Segal et al. (joint learning of regulatory modules from gene expression and sequence data) applied to these cells

  32. One study in humans • Gene expression data used: microarrays measuring genes during cell cycle in HeLa cells • Sequence: 1000 bp promoters (upstream) of human genes

  33. Source: Nature Genetics 37, S38 - S45 (2005) Result of analysis: Two motifs found to be shared by this set of genes. The genes have similar expression profiles. One of the identified motifs (NFAT) known to be involved in cell-cycle

  34. Summary • The common theme is to analyze sets of genes, and relate their common expression patterns to cancer types or to presence of cis-regulatory motifs • Search algorithms may be required to identify some of these features

More Related