1 / 46

Genome organization

Eukaryotic genomes are complex and DNA amounts and organization vary widely between species. Genome organization. Genome Organization. G. C value paradox:. The amount of DNA in the haploid cell of an organism is not related to its evolutionary complexity or number of genes.

Download Presentation

Genome organization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eukaryotic genomes are complex and DNA amounts and organization vary widely between species. Genome organization

  2. Genome Organization G

  3. C value paradox: The amount of DNA in the haploid cell of an organism is not related to its evolutionary complexity or number of genes.

  4. Highly Repeated Sequences

  5. There are different classes of eukaryotic DNA based on sequence complexity.

  6. Amount of DNA in a Genome Does Not Correlate with Complexity 105 106 107 108 109 1010 1011 1012 basepairs

  7. How many genes do humans have? Original estimate was between 50,000 to 100,000 genes We now think humans have ~ 20,000 genes How does this compare to other organisms? Mice have ~30,000 genes Pufferfish have ~35,000 Nematodes (C. elegans), have ~19,000 Yeast (S. cerevisiae) has ~6,000 The microbe responsible for tuberculosis has ~4,000

  8. Single Copy SequencesExome

  9. Even the Amount of DNA a Gene Spans Differs Among Species

  10. Problems? Some gene products are RNA (tRNA, rRNA, others) instead of protein Some nucleic acid sequences that do not encode gene products (noncoding regions) are necessary for production of the gene product (protein or RNA). Eukaryotic genes are complex!

  11. Gene Identification • Open reading frames • Sequence conservation • Database searches • Synteny • Sequence features • CpG islands • Evidence for transcription • ESTs, microarrays • Gene inactivation • Transformation, RNAi

  12. Unique genes

  13. Noncoding regions • Regulatory regions • RNA polymerase binding site • Transcription factor binding sites • Introns • Polyadenylation [poly(A)] sites

  14. Splice Sites Eukaryotes only Removal of internal parts of the newly transcribed RNA. Takes place in the cell nucleus Splice sites difficult to predict

  15. One gene, many proteins via alternative splicing , 3’ cleavage and polyadenlyation

  16. Exon Shuffling

  17. Trans-Splicing in Higher Eukaryotes Gingeras, Nature (2009) 461, 206-211

  18. Non-contiguous Transcription Generates An Enormous Number of Possible Transcripts • Trans-splicing exists in higher eukaryotes as well as in lower ones like Trypanosomes Blue: only co-linear Red: all combinations Six 2-exon co-linear combinations from four exons 325 combinations of 3-exons, non-colinear • Reassortment of exons coding for ncRNA or protein domains could dramatically increase number of functional products beyond the number of ‘genes’ Gingeras, Nature (2009) 461, 206-211

  19. Why genome size isn’t the only concern (size doesn’t matter?) • More sophisticated regulation of expression? • Proteome vastly larger than genome? • Alternate splicing • RNA editing • Postranslational modifications? • Cellular location? • Moonlighting

  20. Gene families E.g. globins, actin, myosin Clustered or dispersed Pseudogenes

  21. Pseudogenes Nonfunctional copies of genes Formed by duplication of ancestral gene, or reverse transcription (and integration) Not expressed due to mutations that produce a stop codon (nonsense or frameshift) or prevent mRNA processing, or due to lack of regulatory sequences

  22. Duplicated genes Encode closely related (homologous) proteins Formed by duplication of an ancestral gene followed by mutation • Five functional genes and two pseudogenes

  23. ParalogsvsOrthologs Different members of the globin gene family are paralogs, having evolved one from another through gene duplication. Paralogs are separated by a gene duplication event. Each specific gene family member (e.g. a specific gene in human) is an ortholog of the same family member in another species (e.g. mouse). Both evolved from an ancestral globingene. Orthologs are separated by a speciation event. It is not always easy to distinguish true orthologs from paralogs , especially in polyploid organisms!

  24. Protein - coding sequences less than 1.5% of the genome in humans!

  25. Noncoding RNAs (ncRNA) Do not have translated ORFs Small Not polyadenylated

  26. Functions of Known lncRNAs • Transcriptional interference -lncRNA transcription turns off transcription of nearby gene • Initiation of chromatin remodeling - lncRNA transcription turns on transcription of nearby gene • Promoter inactivation - lncRNA binds to TFIIB and to promoter DNA • Activation of an accessory protein - lncRNA binds to allosteric effector protein TLS and inhibits histone acetyltransferase, decreasing transcription Ponting et al, Cell (2009) 136, 629-641

  27. Functions of Known lncRNAs • Activation of transcription factors - binding of lncRNA to Dlx2 activates Dlx5/6 activity • Oligomerization of an accessory protein - lncRNA induces heat shock factor trimerization • Transport of transcription factors -lnRNA NRON keeps NFAT out of nucleus • Epigenetic silencing of gene clusters -Xist RNA inactivates X chromosome • Epigenetic repression of genes in trans -HOTAIR binds PRC2, leading to methylation and silencing of several genes in HOXD locus Ponting et al, Cell (2009) 136, 629-641

  28. ncRNA • ~97-98% of the transcriptional output of the human genome is ncRNA • Introns • Transfer RNAs (tRNA) • ~ 500 tRNA genes in human genome • Ribosomal RNAs • Tandem arrays on several chromosomes • 150-200 copies of 28S – 5.8S – 18S cluster • 200-300 copies of 5S cluster

  29. Genome Organization - ncRNA The level of transcription from human chromosomes 21 and 22 is an order of magnitude higher than can be accounted for by known or predicted exons Almost half of all transcripts from well-constructed mouse cDNA libraries are ncRNAs (identified because they do not code for an open reading frame of larger than 100 codons)

  30. Repeat sequences – 50% or more of the genome

  31. Repetitive DNA • Moderately repeated DNA • Tandemly repeated rRNA, tRNA and histone genes (gene products needed in high amounts) • Large duplicated gene families • Mobile DNA • Simple-sequence DNA • Tandemly repeated short sequences • Found in centromeres and telomeres (and others) • Used in DNA fingerprinting to identify individuals

  32. Segmental duplications Found especially around centromeres and telomeres Often come from nonhomologous chromosomes Many can come from the same source Tend to be large (10 to 50 kb) Unique to humans?

  33. Repetitive DNA - Segmental duplications

  34. Mobile DNA • Moves within genomes • Most of the moderately repeated DNA sequences found throughout higher eukaryotic genomes • L1 LINE is ~5% of human DNA (~50,000 copies) • Alu is ~5% of human DNA (>500,000 copies) • Some encode enzymes that catalyze movement

  35. Repetitive DNA – Highly repetitive satellite DNA

More Related