1 / 53

Michele Markstein IEEE CSB 2003 Stanford University August 11, 2003 michele@opengenomics.org

Computing non-coding cis-regulatory DNAs. Michele Markstein IEEE CSB 2003 Stanford University August 11, 2003 michele@opengenomics.org. OUTLINE (first-half).

yehudah
Download Presentation

Michele Markstein IEEE CSB 2003 Stanford University August 11, 2003 michele@opengenomics.org

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing non-coding cis-regulatory DNAs Michele Markstein IEEE CSB 2003Stanford UniversityAugust 11, 2003 michele@opengenomics.org

  2. OUTLINE(first-half) 1.Brief Review of Central Dogma (DNA->RNA-> Protein)base-pairing, gene architecture, transcription, translation2. Landscape of the Human Genome3. Cis-regulationEnhancers, Insulators, Chromatin Boundaries

  3. BASE PAIRINGDNA serves as a template for DNA and RNA P The Building Block of DNA is the NUCLEOTIDE S 5’ 3’ P 5’ 3’ S A S T P P S S G C P P 3’ S S 5’ A T P P Template Strand S S G C 3’ P 5’ Template Strand BASE

  4. Gene Architecture and the Central Dogma AUG UAA exon 1 exon 2 exon 3 DNA intron 2 intron 1 TATA Transcription mRNA splicing Mature mRNA Nucleus Introns stay in the nucleus exons exit the nucleus Translation protein protein folding Cytoplasm

  5. GGGTGTTTCCAAAAATACTCGGGTGTTTCCAAAAATACTCGAGTGGTCTCGTAGGTAGTGAGTCAAATGGCGCCATACATAATGATTGTTGAGTTCTTGTGTCTTTGGTCCAGTGTCTCGGCTGTTAATTGCGTCTGTTACGATGCAATTACTAGCTTGTTAGGATTCAGTATTATTTGGAGTCCAAAGGAAAAGGTCACAATAATGGCGAAGCGGCTGATTTCGTTAAAAATTTTTACCCTTCATTTCTTATACCCGTCACGCTTCCACCCATACAAATTTTAGGCGTACAAAAAATGACCAGAGAACTGCAGCCCGCATACAAAAAATGACCTGCGGCAGATCGTTGACTGTGCGTCCACTCACCCATACGGCTCTTGCGCAGCAGGCCTCGGGTGGTTTTTTTACTAGTAAATTGCCCCGCCCCCCAACGGTTACGATGCAATTACTAGCTTGTTAGGATTCAGTATTATTTGGAAGCCAAAGGAAAAGGTCACAATAATGGCAGAAGCGGCTGATTTCGTTAAAAATAAAATTAACAATGGAACATACTCAGTTGCCAATAAACATAAAGGAAAAAGTGTTATTTGGTGCATTTTATGTGACATTTTAAAGGAAGATGAAACTGTTCTGACGGATGGCTGCAGCCCGCATACAAAAAATGACCTGCGGCCGATCGTTGACTGTGCGTCCACTCACCCATACGGCTCTTGCGCAGCAGGCCTCTTGCGCGTCAGGCCTCGTACATAATGATTGTTGAGTTCTTGTGTCTTTGGTCCAGTGTCTCGGCTGTTAATTGCCCTTTGTACGATGCAATTACTAGCTTGTTAGGATTCAGTATTATTTGGAAGCCAAAGGAAAAGGTCCCAAAACACAATAATGGCGAAGCGGCTGATTTCGTTAAAAATTCCCTACCCTTCATTTCTTATACCCGTCACGCTTCCACCCATACAAATTTTAGGCGTACAAAAAATGACCACAATAATGGCAGAAGCGGCTGATTTCGTTAAAAATAAAATTAACAATGGAACATACTCAGTTGCCAATAAACCAGAGAACTGCAGCCCGCAGGTGGTTTTTTTACTCGTAAATTGCCCCACGATGCAGTTACTAGCTTGTTAGGATTCAGTATTATTTGGAAGCCAAAGGAAAAGGTCACAATAATGGCAGAAGCGGCTGATTAGGTTAAAAATAAAATTAACAATGGAACATACTCAGTTGCCAATAAACATAAAGGGGGTGTTTCCAAAAATACTCGGGTGTTTCCAAAAATACTCGAGTGGTCTCGTAGGTAGTGAGTCAAATGGCGCCATACATAATGATTGTTGAGTTCTTGTGTCTTTGGTCCAGTGTCTCGGCTGTTAATTGCGTCTGTTACGATGCAATTACTAGCTTGTTAGGATTCAGTATTATTTGGAGTCCAAAGGAAAAGGTCACAATAATGGCGAAGCGGCTGATTTCGTTAAAAATTTTTACCCTTCATTTCTTATACCCGTCACGCTTCCACCCATACAAATTTTAGGCGTACAAAAAATGACCAGAGAACTGCAGCCCGCATACAAAAAATGACCTGCGGCAGATCGTTGACTGTGCGTCCACTCACCCATACGGCTCTTGCGCAGCAGGCCTCGGGTGGTTTTTTTACTAGTAAATTGCCCCGCCCCCCAACGGTTACGATGCAATTACTAGCTTGTTAGGATTCAGTATTATTTGGAAGCCAAAGGAAAAGGTCACAATAATGGCAGAAGCGGCTGATTTCGTTAAAAATAAAATTAACAATGGAACATACTCAGTTGCCAATAAACATAAAGGAAAAAGTGTTATTTGGTGCATTTTATGTGACATTTTAAAGGAAGATGAAACTGTTCTGACGGATGGCTGCAGCCCGCATACAAAAAATGACCTGCGGCCGATCGTTGACTGTGCGTCCACTCACCCATACGGCTCTTGCGCAGCAGGCCTCTTGCGCGTCAGGCCTCGTACATAATGATTGTTGAGTTCTTGTGTCTTTGGTCCAGTGTCTCGGCTGTTAATTGCCCTTTGTACGATGCAATTACTAGCTTGTTAGGATTCAGTATTATTTGGAAGCCAAAGGAAAAGGTCCCAAAACACAATAATGGCGAAGCGGCTGATTTCGTTAAAAATTCCCTACCCTTCATTTCTTATACCCGTCACGCTTCCACCCATACAAATTTTAGGCGTACAAAAAATGACCACAATAATGGCAGAAGCGGCTGATTTCGTTAAAAATAAAATTAACAATGGAACATACTCAGTTGCCAATAAACCAGAGAACTGCAGCCCGCAGGTGGTTTTTTTACTCGTAAATTGCCCCACGATGCAGTTACTAGCTTGTTAGGATTCAGTATTATTTGGAAGCCAAAGGAAAAGGTCACAATAATGGCAGAAGCGGCTGATTAGGTTAAAAATAAAATTAACAATGGAACATACTCAGTTGCCAATAAACATAAAGG Exon 1 Intron 1 Exon 2 Intron 2 Exon 3 E2 E1 E3 Another View of Exon/Intron Structure

  6. Snap-shot of RNA transcription

  7. Puzzle: how do you translate a 4-letter alphabet into a 20-letter alphabet? The Triplet Code nucleotides amino acids 64 combinations Each triplet is called a Codon

  8. Pro Gly amino-acid generic tRNA C G G C U U anti-codon G G A C C A U U U 1 1 2 2 3 3 mRNA The “Genetic Code” codons amino acids

  9. His Met G U A U A C A U G G G A A A G C G G A C C A U U U The Ribosome sets the reading frame C A U G C A U C

  10. Anatomy of mRNA mRNA 5’ UTR 3’ UTR AUG UAA UTR= untranslated region translation Protein mRNA is composed of EXONS not all of the mRNA necessarily serves as template for protein synthesis (hence 5’ and 3’ UTRs) therefore not all EXONS or parts of EXONS necessarily serve as template for protein synthesis

  11. The Human Genome estimated to have 25,000 – 30,000 genes Estimate of 100,000 genes was a “back of the envelope” guess by a Harvard Professor in the mid-80’sgene = 30,000 bpgenome = 3 billion bp Table from Lander ES, et al.Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921. Erratum in: Nature 2001 Aug 2;412(6846):565. Nature 2001 Jun 7;411(6838):720. PMID: 11237011 [PubMed - indexed for MEDLINE]

  12. Copied from NCBI

  13. Genome size does not correlate with complexity YEAST HUMAN AMOEBA 9 9 .012 X 10 3 X 10 600 X 10 9 ~5,500 genes ~30,000 genes ?

  14. 1-2 % of the human genome encodes proteins 50% 25% 15% 10% REPEATS GENES ? H exons introns cis-regulation? H = largely unsequenced heterochromatin

  15. The human genome is AT- rich G + C content = 41% CG CG di-nucleotides expected at frequency of.2 X .2 = .04BUT, observed only 1/5 as frequently as expectedWhy? CG is often methylated, and spontaneous de-amination converts the C to T

  16. CpG islandsassociated with the beginning of genes C G From: Lander ES, et al.Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921. Erratum in: Nature 2001 Aug 2;412(6846):565. Nature 2001 Jun 7;411(6838):720.

  17. 2 Major Classes of Repeats: Transposons 45% of our genome Simple Repeats 3% of our genome(A)nor (CA)nor (CGG)nwhere n=1 to 11 generally microsatellites—exhibit great variation Junk or “rich paleontological record” ? 1 in 600 mutation in humans are due to transposons10% of mutations in mouse due to transposons Why?

  18. 4 TYPES OF TRANSPOSONS LINES = long interspersed repeats (L1 still active) SINES = short interspersed repeats (ALU sequences) Diagram from Lander ES, et al.Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921. Erratum in: Nature 2001 Aug 2;412(6846):565. Nature 2001 Jun 7;411(6838):720. PMID: 11237011 [PubMed - indexed for MEDLINE]

  19. LINES = long interspersed repeats (L1 still active) spreads by “copy & paste” 1 2  DNA mRNA Cell nucleus Cell cytoplasm mRNA  Full-length LINE = 6kbencodes 2 ORFsabout 60-100 LINES still mobileNew L1 Jump in every 10-250 people born 1. Reverse Transcriptase 2. endonuclease

  20. SINES — do not encode proteinsThey take advantage of LINE’s machinery to move Retrovirus-like transposonslike LINES except they make the double-stranded RNAin the cytoplasm. Encode 2 proteins: Reverse Transcriptase and Integrase. HIV and other Retroviruses have 2 extra genes: coat protein and envelope protein DNA TransposonsA dying breed. They require virgin genomes to survive because they don’t have the advantage of “cis-preference”.

  21. MER855’ MER853’ ORF 1.7 kb CREATIVE or DESTRUCTIVE FORCE? 3’ tranduction—LINEs have a tendency to transcribe DNA beyond their 3’ end and thereby move host DNA Novel proteinClosest sequence is the insect piggyBAC transposonExpressed in fetal brain and cancer cellsMaintained for 40-50 MyrOther candidiates: intronless genesMost LINES found in AT-rich, gene-poor regions: they integrate at TTTT/A

  22. Alus accumulate in GC-rich gene-rich regions! Why? Increased loss at AT regions? Selective benefit to retaining Alus near genes? May be used in the stress response to mediate QUICK responses; e.g. they have been shown to promote translation Graph from Lander ES, et al.Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921. Erratum in: Nature 2001 Aug 2;412(6846):565. Nature 2001 Jun 7;411(6838):720. PMID: 11237011 [PubMed - indexed for MEDLINE]

  23. Alu sequences evenly spread out across most chromosomes (exception is Chr.19) Graph from Lander ES, et al.Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921. Erratum in: Nature 2001 Aug 2;412(6846):565. Nature 2001 Jun 7;411(6838):720. PMID: 11237011 [PubMed - indexed for MEDLINE]

  24. Gene Regulation Odorant receptor(neurons) Drosomycinanti-microbial peptide(liver, secreted into blood) Genomic EquivalenceAll cells have the same DNA but they express only a subset of available genes Berkeley Drosophila Genome Browser at www.fruitfly.org

  25. Gary Felsenfeld* & Mark Groudine† NATURE | VOL 421 | 23 JANUARY 2003 | www.nature.com/naturealso in Albert’s Textbook Molecular Biology of the Cell

  26. simplified anatomy of a gene Slide from Mike Levine

  27. Changes in regulatory DNA cause changes in morphology Slide from Mike Levine

  28. in vivo assay for enhancer activity Slide from Mike Levine

  29. Regulatory DNA is modular Slide Courtesy of Mike Levine

  30. Above are the results of an in situ hybridization. This in situ shows mRNA localization in fly embryos. The embryo on the left shows sog mRNA in blue. The embryo on the right shows lacZ mRNA in blue. Both patterns are about the same--thus indicating that the dorsal cluster is sufficient to drive the sog pattern of expression Enhancers can also be intronic THE EXPERIMENT: A 263 bp cluster of Dorsal binding sites in the intron of a gene called “sog” was cloned and fused to a lacZ reporter. This fusion construct was injected into the fly germline to make transgenic flies. Markstein et al., Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc Natl Acad Sci U S A. 2002 Jan 22;99(2):763-8. Epub 2001 Dec 18.

  31. Gene Regulation: Trafficking Problem

  32. Gene Regulation: Trafficking Problem Promoter competition Tethering Element Insulator

  33. Butler and Kadonaga Genes and Development 2002

  34. Gene Regulation: Trafficking Problem Promoter competition Human:over half of txn start sites are associated with CpG islands Ohler, U., Liao, G.C., Niemann, H., and Rubin, G.M. Computational analysis of core promoters in the Drosophila genome. Genome Biology3, RESEARCH0087. Epub 2002 Dec 20.

  35. Promoter-proximal tethering elements regulate enhancer-promoter specificity in the Drosophila Antennapediacomplex Vincent C. Calhoun, Angelike Stathopoulos, and Michael LevinePNAS July 9, 2002 vol. 99 no. 14 9243–9247

  36. Microarray Experiment involves RNA-DNA base pairing on spotted DNA chips Learn all about microarrays at Pat Brown’s Homepage http://cmgm.stanford.edu/pbrown/

  37. Spellman PT, Rubin GM.Evidence for large domains of similarly expressed genes in the Drosophila genome. J Biol. 2002;1(1):5. Epub 2002 Jun 18.

  38. Genes are organized into co-expression domainson average about 10 genes per 100,000 bp (in flies) We don’t know what determines the boundaries or if they are functional Weitzman JB.Transcriptional territories in the genome. J Biol. 2002;1(1):2. Epub 2002 Jun 25

  39. OUTLINE(second-half) 1.Identifying regulatory regions by phylogenetic comparisons in yeast2. Phylogenetic comparisons in mouse-human 3. Ab initio predictions of enhancers in flies

  40. PHYLOGENETIC APPROACH IN YEAST Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003 May 15;423(6937):241-54.

  41. Kellis et al. 2003

  42. PHYLOGENETIC APPROACH IN MAMMALS Loots GG, Locksley RM, Blankespoor CM, Wang ZE, Miller W, Rubin EM, Frazer KA.Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science. 2000 Apr 7;288(5463):136-40.

  43. Ab initio Method of predicting enhancers Scan the Genome for Clusters of Binding Sites Cis-Analysthttp://rana.lbl.gov/cis-analyst/ Fly Enhancerhttp://flyenhancer.org Cluster Busterhttp://sullivan.bu.edu/cluster-buster/

  44. + all 25-mers 1. Mix your TF with a pool of all possible 25-mers TF 2. Isolate 25-mers that bind your TF 3. Cut 25-mers out of gel and sequence - bound 25-mers + free 25-mers Defining TF binding sites SELEX = selected evolution of ligand by exonential-enrichment

  45. Selex Results for Dorsal GGGAATTCCC GGGAATTCCC GGGTTATCCC GGGAATTCCA Analyze about 30 independently obtained sequences gel consensus?

  46. Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M, Rubin GM, Eisen MB.Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci U S A. 2002 Jan 22;99(2):757-62.

  47. Berman et al., 2003

  48. Markstein M., unpublished data 2003

More Related