1 / 56

Microarrays and Promoter Analysis

Microarrays and Promoter Analysis. Gregory Gonye Topic 5 ELEG-667. Overview. Opportunity: general cDNA data, ESTs Genomic data Microarrays nuts and bolts analysis Opportunity: specific Yeast example (Church et al.) Statement of Work: Project Review. Opportunity.

kasi
Download Presentation

Microarrays and Promoter Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarrays and Promoter Analysis Gregory Gonye Topic 5 ELEG-667

  2. Overview • Opportunity: general • cDNA data, ESTs • Genomic data • Microarrays • nuts and bolts • analysis • Opportunity: specific • Yeast example (Church et al.) • Statement of Work: Project Review

  3. Opportunity • Large scale EST projects for many organisms • Genome sequencing projects for many organisms • Highly parallel technologies for gene expression measurement • Sophisticated analysis for associating genes by expression • Full length cDNA sequence data

  4. Opportunity • Large scale EST projects for many organisms • Sequence data for mRNAs and some protein • Clones corresponding to mRNAs (cDNA clones) for reagents • Expression data (significance with scale) • Homologies between species

  5. Opportunity • Genome sequencing projects for many organisms • genomic DNA data contains all the information • mRNA • transcriptional control elements: promoters and enhancers • organization • Comparative genomics • Conservation=Function

  6. Opportunity • Highly parallel technologies for gene expression measurement • Sophisticated analysis for associating genes by expression • exploits EST projects for data and reagents • produces associations/hypotheses to be tested • uses and produces annotation

  7. Opportunity • Large scale EST projects for many organisms • Genome sequencing projects for many organisms • Highly parallel technologies for gene expression measurement • Sophisticated analysis for associating genes by expression

  8. Opportunity • Full length cDNA sequence data • Promoter-proximal mRNA sequence • “real” protein data (full length ORFs) • reagents for expression • data for converting mRNA to gene

  9. Opportunity • Large scale EST projects for many organisms • Genome sequencing projects for many organisms • Highly parallel technologies for gene expression measurement • Sophisticated analysis for associating genes by expression • Full length cDNA sequence data

  10. Opportunity • Whole greater then sum of parts: [mRNA data, EST and full length]X[genomic data] = Genes [Genes]X[genomic data] = [Promoters] [cDNA]microarray X [mRNA] = [Genes]assoc [Genes]assocX [Promoters] = Functional data

  11. Opportunity • In plain English: The cDNA data is used to identify and locate genes in the genomic data. Using the gene location and gene structure, we predict the promoter region for each gene. A subset of these genes are associated by a microarray experiment. Analysis of the promoters of the associated genes, within and across species, can lead to knowledge of how these genes are regulated.

  12. Overview • Opportunity: general • cDNA data, ESTs • Genomic data • Microarrays • nuts and bolts • analysis • Opportunity: specific • Yeast example (Church et al.) • Statement of Work: Project Review

  13. Molecular Biology 101:Genome (DNA) to Genes to mRNA Required to understand relationships of data sets Three large scale efforts: • Expressed Sequence Tag • Full length cDNA • Genomic DNA

  14. Biological Information Flow = Central Dogma TACTGACGAAAA ATGACTGCTTTT DNA transcription AUGACUGCUUUU splicing (higher organisms) RNA translation Protein Met-Thr-Ala-Phe

  15. Exons and Introns in Eucaryotes intron1 exon 1 exon 2 intron 2 DNA Primary Transcript mature messenger RNA

  16. cDNA and ESTs mRNA is converted to a DNA copy = complementary DNA, cDNA

  17. cDNA Synthesis mRNA 3’ AAAAAn TTTTTn 3’ Reverse Transcriptase dNTPs and primer mRNA 3’ AAAAAn TTTTTn cDNA first strand dNTPs RNAse H, DNAP AAAAAn TTTTTn Second Strand

  18. cDNA and ESTs mRNA is converted to a DNA copy = complementary DNA, cDNA

  19. cDNA and ESTs • mRNA is converted to a DNA copy • = complementary DNA, cDNA • cDNA is directionally inserted into a vector (plasmid) DNA • 1. Clonal propagation/amplification in E. coli • 2. Addition of known sequence flanking unknown cDNA

  20. cDNA and ESTs • mRNA is converted to a DNA copy • = complementary DNA, cDNA • cDNA is directionally inserted into a vector (plasmid) DNA • 1. Clonal propagation/amplification in E. coli • 2. Addition of known sequence flanking unknown cDNA • EST is obtained from cDNA insert (~400-800 bases) using known vector sequence (universal) as priming site • EST is sequence data. EST clone is reagent used to obtain that data. EST is sequence for only part of cDNA in EST clone .

  21. EST data resources • dbEST and UniGene • http://ncbi.nlm.nih.gov/UniGene • TIGR Gene Indexes • http://www.tigr.org/tdb/tgi.shtml Objectives: • Cluster sequences to identify individual mRNAs (~1 cluster=1 mRNA) • Annotate clusters • Distribute clones

  22. Molecular Biology 101:Genome (DNA) to Genes to mRNA Required to understand relationships of data sets Three large scale efforts: • Expressed Sequence Tag • Full length cDNA • Genomic DNA

  23. Full length cDNA • Why?: • complete protein coding information • complete exon information (=gene when combined with genomic data) • 5’ end directs search for promoter elements

  24. Full length cDNA • Why?: • complete protein coding information • complete exon information (=gene when combined with genomic data) • 5’ end directs search for promoter elements • Who?: • Y. Hayashizaki, RIKEN group Yokohama, Japan http://genome.rtc.riken.go.jp/

  25. Full Length cDNA: How? • Issues: • Need copying to be complete, no partial products • Need to differentiate full length mRNAs from degraded mRNAs • Cloning requirements remain the same: • directional, universal vector, high efficiency • Need to redirect sequencing to full insert, not single run from one end

  26. Full Length cDNA: How? • Issues: • Need copying to be complete, no partial products • Problem is secondary structure of mRNAs and lack of processivity of Reverse Transcriptase • Solution: Thermo-stabilize RTase using trehalose, run RT reaction at high temperature destabilizing secondary structure, allowing longer elongation produces

  27. Full Length cDNA: How? • Issues: • Need to differentiate full length mRNAs from degraded mRNAs Step One Full length 7meG P Degraded Cap-specific chemical biotinylation RTase+trehalose Bio-G Full length P Degraded

  28. Full Length cDNA: How? • Issues: • Need to differentiate full length mRNAs from degraded mRNAs Step Two Full length Bio-G Degraded SA Purified Full length SA Bio-G

  29. Full length cDNA • Results: • FANTOM Consortium: Functional Annotation of Mouse ~21,000 nonredundant cDNAs • Full length clones used for protein expression • Large scale protein-protein interaction matrix • 5’ mRNA sequence available on large scale

  30. Molecular Biology 101:Genome (DNA) to Genes to mRNA Required to understand relationships of data sets Three large scale efforts: • Expressed Sequence Tag • Full length cDNA • Genomic DNA

  31. Genomic DNA Sequencing • Genomic DNA Structure: • Genes: Promoters, Enhancers, Exons, Introns • Intergenic: Structural DNA, repeats, telomeres • Chromosomes: Varied size and number intron1 exon 1 exon 2 intron 2 Promoter

  32. Genomic DNA Sequencing • Mammalian genomes about 3 billion bases • Genomes broken into chunks of 100-500kb • Bacterial Artificial Chromosomes, BAC libraries • BAC inserts ordered by end sequencing and cross hybridization to generate nonredundant “Golden path” • Sequencing effort distributed/coordinated internationally

  33. Genomic DNA Sequencing • Two complementary approaches: • Walking: • Subclone pieces of BAC inserts • start from both ends of sub-BAC inserts and sequence inward • from sequence generated design next set of sequencing primers, rerun, redesign,… • Shotgun: • Generate random fragments and size-select ~2000bp • Sequence from both ends • Assemble sequences to contigs, assemble contigs

  34. Genomic DNA Sequencing • Resources: • Trace Archives (NCBI): Individual unassembled shotgun sequence data • Genomic section of GenBank (NCBI): Contigs, BAC end sequences, BAC assemblies • Whitehead Institute: Assemblies • Ensembl: Annotation of assembled genomes

  35. Convergence of Data Three large scale efforts: • Expressed Sequence Tag>>partial Exons • Full length cDNA>>5’ Exons • Genomic DNA>>Gene Predictions Annotated Genome

  36. Overview • Opportunity: general • cDNA data, ESTs • Genomic data • Microarrays • nuts and bolts • analysis • Opportunity: specific • Yeast example (Church et al.) • Statement of Work: Project Review

  37. Overview • Opportunity: general • cDNA data, ESTs • Genomic data • Microarrays • nuts and bolts • analysis • Opportunity: specific • Yeast example (Church et al.) • Statement of Work: Project Review

  38. Highly Parallel Gene Expression Analysis: cDNA Microarrays • Molecular reagents produced from EST sequencing projects • EST= Expressed Sequence Tag • Clustering of ESTs identifies mRNA diversity • Nonredundant sets of reagents available (>40,000 for human) • Technology to use reagents in parallel: microarrays • cDNA or oligonucleotide microarrays

  39. Microarray-based Gene Expression Analysis • Manufactured at high density with robotics • 1x3” glass slide common format • “printing” or “in situ synthesis” • 20-30,000 spots per slide • mRNA is converted to labeled cDNA for fluorescent hybridization analysis • Ratio-metric approach compares expression between samples or to a reference sample by cohybridization: “fold-change”

  40. Tissue2 RNA2 labeled cDNA2 Tissue1 RNA1 labeled cDNA1 Schematic Diagram: Microarray-based Analysis EST project cDNA clones cDNA microarray PCR printing Cohybridize to microarray Scan microarray to detect each fluorophore (16 bit grayscale images) Identify signal pixels (spot finding) Quantitate pixel intensity>>ratios

  41. C Control Mix Control 92 Ethanol 91 Control Ethanol Control Control

  42. Advanced Analyses • Clustering • What: • genes • experiments • experiments and genes • Why: • Classification: how many “types” of samples are there? can type be predicted? Which genes are best predictors? • Coregulation: which genes respond alike? Pathway(s) implicated? Epistasis? Regulatory network prediction

  43. Overview • Opportunity: general • cDNA data, ESTs • Genomic data • Microarrays • nuts and bolts • analysis • Opportunity: specific • Yeast example (Church et al.) • Statement of Work: Project Review

  44. Overview • Opportunity: general • cDNA data, ESTs • Genomic data • Microarrays • nuts and bolts • analysis • Opportunity: specific • Yeast example (Church et al.) • Statement of Work: Project Review

  45. Post-Clustering Analysis • Why are members of a cluster clustering together? • Functional (Pathway): all ribosomal proteins, DNA synthesis machinary, lysine biosynthesis • Serial regulation: cascades • Transcriptional coregulation via common regulatory elements: conserved promoters

  46. What is a promoter? • Cis-acting: physically associated with the gene • Directional: defines transcription initiation site and coding strand • with TATA box fairly homogeneous • without TATA box less stringent • Core elements recruit pol II (but inactive) • Regulatory elements are binding sites for Transcription Factors (+/- active pol II complex) • Sequence-specificity determines P(occupied)

  47. Convergence at Harvard • Yeast: • Large genomic-scale gene expression data set • complete annotated genome • NO introns, minimal intergenic sequence • Predicted promoters for every ORF • Church et al.: Combined microarray data, clustering, and promoter informatics to identify conserved, cluster enriched, regulatory domains

  48. Tavazoie et al. • Whole genome expression data (Affy) • Synchronized culture • 15 time points across two cell cycles • K-means clustering (Euclidian distance) of 3000 highest variance ORFs to 30 clusters • Obtained 600bp promoter sequence from genome sequence for each ORF • Used AlignAce (Gibb’s sampling algorithm) to discover motifs

  49. Tavazoie et al.: Results

  50. Livesey et al: extend to mouse • Crx knockout mouse vs. wt • cone and rod differentiation in retina • Microarray analysis • Crx+/+ vs Crx-/- retina RNA • 16 genes out of 960 tested • Promoter analysis • proximal 250bp of genes with available promoters (5) used AlignAce to detect motifs • found single or tandem CBE elements in all

More Related