1 / 38

The evolution of expression patterns in the Arabidopsis genome

The evolution of expression patterns in the Arabidopsis genome. Todd Vision Department of Biology University of North Carolina at Chapel Hill. Driving forces in genome evolution. Proximate vs. ultimate explanations

jasper
Download Presentation

The evolution of expression patterns in the Arabidopsis genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The evolution of expression patterns in the Arabidopsis genome Todd Vision Department of Biology University of North Carolina at Chapel Hill

  2. Driving forces in genome evolution • Proximate vs. ultimate explanations • Deleterious mutations are frequent and selection cannot effectively act on all of them • Substitutions • Insertions and deletions • Duplications • Transpositions • Cellular processes will be affected by this rain of mutations • At the molecular level, we must entertain ultimate explanations that do not invokeadaption

  3. An example: Codon bias • Genes differ in the frequency that they use the preferred codon for a given amino acid, thereby affecting • Translational efficiency • Translational accuracy • The strongest codon bias is typically seen in short, highly expressed genes under strong purifying selection • Realized codon bias is a balance between selection for preferred codons and a continual rain of mutations toward unpreferred codons

  4. What are the consequences of mutational rain on the regulatory networks that modulate gene expression?

  5. Outline • Arabidopsis gene expression (MPSS) • Two evolutionary issues in the evolution of expression profiles: • Physical clustering of co-expressed genes • Divergence of duplicated genes

  6. Digital expression profiling • “Bar-code” counting raises fewer concerns about cross-hybridization, probe selection, background hybridization, etc. • Serial Analysis of Gene Expression (SAGE) • Count occurrence of 10-12 bp mRNA signatures • Long SAGE: 21-22 bp signatures • Uses conventional sequencing technology • Massively Parallel Signature Sequencing (MPSS) • Count occurrence of 17-20 bp mRNA signatures • Cloning and sequencing is done on microbeads • Commercialized by Lynx Therapeutics

  7. AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA mRNA AAAAAAA extract mRNA from tissue Convert to cDNA TTTTTTT Add linker AAAAAAA Cut w/ Sau3A TTTTTTT AAAAAAA 3’ - Add unique 32 bp tag and standard primer 5’ - Add standard primer TTTTTTT AAAAAAA (added by cloning) Anneal to beads coated with unique anti-tag (32 bp, complementary to tag on mRNA) PCR TTTTTTT AAAAAAA Remove 3’ primer and expose single stranded unique tag (digest, 3'  5' exonuclease) MPSS library construction Brenner et al., PNAS 97:1665-70. GATC

  8. AAAAAAA AAAAAAA AAAAAAA MPSS library construction AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA Brenner et al., PNAS 97:1665-70. AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA Sort by FACS to remove ‘empty’ beads The result of the library construction is a set of microbeads. Each bead contains many DNA molecules, all derived from the 3’ end of a single transcript. Beads are loaded in a monolayer on a microscope slide for the sequencing of 17 – 20 bp from the 5’ end.

  9. NNNN 4 3 2 1 + NNNX CODEX1 RS NNXN CODEX2 RS NXNN CODEX3 RS Sequence by hybridization XNNN CODEX4 RS Add adaptors 16 cycles for 4 bp Digest with Type IIS enzyme to uncover next 4 bases 13 bp Repeat Cycle Steps of four bases; overhang is shifted by four bases in each round ^ GNNN CODEC4 RS DECODERED CNNN ^ 4 3 2 1 NNNN 9 bp 8 7 6 5 MPSS Sequencing Brenner et al., Nat. Biotech. 18:630-4.

  10. TGA ATG MPSS Sequencing Each bead provides a signature of 17-20 bp Signature Sequence # of Beads (Frequency) Tag # 1 2 3 4 5 6 7 8 9 . . 30,285 GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCTTTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCAAGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTACCAGAACTCGG . . GATCGGACCGATCGACT 2 53 212 349 417 561 672 702 814 . . 2,935 Total # of tags: >1,000,000 Two sets of signatures are generated from each sample in different reading frames staggered by two bases

  11. A catalog of signatures in the Arabidopsis genome “Hits” At genome % of total Random 1 748204 87.407% 845057 2 88392 10.326% 6134 3 11019 1.287% 21 4 3512 0.410% 0 5 1452 0.170% 0 6 874 0.102% 0 7 470 0.055% 0 8 326 0.038% 0 9 237 0.028% 0 10 192 0.022% 0 11 158 0.018% 0 12-20 707 0.083% 0 21-30 247 0.029% 0 31-50 124 0.014% 0 > 50 86 0.010% 0 Total 851,212 851,212 All potential signatures (GATC + 13 bp) are identified on both strands of the genomic sequence. There is one potential signature appx. every 293 bp on each strand of genome A signature is classified according to its position relative to the 29,084 genes & pseudogenes in the TIGR annotation Signatures may not be unique. The number of ‘hits’ in the genome is recorded

  12. Duplicated: expression may be from other site in genome Potential alternative splicing or nested gene Potential alternative termination Anti-sense transcript or nested gene? Potential anti-sense transcript Potential un-annotated ORF Triangles refer to colors used on our web page: Class 1 - in an exon, same strand as ORF. Class 2 - within 500 bp after stop codon, same strand as ORF. Class 3 - anti-sense of ORF (like Class 1, but on opposite strand). Class 4 - in genome but NOT class 1, 2, 3, 5 or 6. Class 5 - entirely within intron, same strand. Class 6 - entirely within intron, anti-sense. Grey = potential signature NOT expressed Class 0 - signatures found in the expression libraries but not the genome. or or or or or or Classifying signatures Typical signatures

  13. Arabidopsis signatures Based on TIGR annotation (release 3.0, July 2002) Class # in genome % of total 1 sense exonic 203,174 24.0 2 3’UTR, <500 bp 44,202 5.2 3 anti-sense exonic 197,065 23.3 4 inter-genic 288,109 34.0 5 intronic 60,817 7.2 6 anti-sense intronic 57,845 6.8 TOTAL 851,212 100.5 355 genes lack potential Class 1 or 2 signatures (undetectable) On average, there are 8.5 class 1 & 2 signatures per gene 8422 genomic signatures have secondary classes due to overlap or near overlap of two genes in the TIGR annotation.

  14. Core Arabidopsis MPSS librariessequenced by Lynx for Blake Meyers, U. of Delaware Signatures Distinct Library sequenced signatures Root 3,645,414 48,102 Shoot 2,885,229 53,396 Flower 1,791,460 37,754 Callus 1,963,474 40,903 Silique 2,018,785 38,503 TOTAL 12,304,362 133,377

  15. Chr. I Chr. II Chr. III Chr. IV Chr. V Genome-wide expression profiling Arabidopsis Of the 29,084 gene models, 14,674 match unique, expressed signatures

  16. http://www.dbi.udel.edu/mpss • Query by • Sequence • Arabidopsis gene identifier • chromosomal position • BAC clone ID • MPSS signature • Library comparison • Site includes • Library and tissue information • FAQs and help pages

  17. Outline • Arabidopsis gene expression (MPSS) • Two evolutionary issues in the evolution of expression profiles: • Physical clustering of co-expressed genes • Divergence of duplicated genes

  18. Physical clustering of co-expression Caenorhabditis elegans Roy et al., (2002) Nature 418, 975 Lercher et al (2003) Genome Research 13, 238 Drosophila melanogaster Boutanaev et al (2002) Nature 420, 666 Spellman and Rubin (2002) J Biology 1, 5 Homo sapiens Caron et al (2001) Science 291, 1289 Lercher et al (2002) Nature Genetics 31, 180 Saccharomyces cerevisiae Cohen et al (2000) Nature Genetics 26, 183 Hurst et al (2002) Trends in Genetics 18, 604 Mannila et al (2002) Bioinformatics 18, 482 ‘ • What are the proximate explanations? • shared cis-regulatory elements • chromatin packaging, etc. • What are the ultimate explanations? • Adaptive: greater transcriptional efficiency/accuracy? • Maladaptive: mutational rain chipping away at insulators and other mechanisms that over-ride regional controllers of gene expression?

  19. library 2 library 1 library 3 Measuring expression distance

  20. Clustering of tissue-specific expression Chromosome 1 Flower (red)Silique (violet)Leaf (green)Root (blue)Callus (white)

  21. Statistical tests of coexpression clustering • Measured median pairwise expression distance (MPED) in non-overlapping windows of 20 genes • Summed unique class 1 and 2 signatures for each gene • Only one gene within each tandemly arrayed family was counted • Out of 100 shuffles of gene order • Zero shuffles had as many windows with small MPED (less than 1.5) as the unshuffled data • Zero shuffles had as large a variance in MPED among windows as the unshuffled data

  22. Coexpression in Arabidopsis

  23. Coexpression in Arabidopsis

  24. Coexpression in Arabidopsis

  25. Selection and recombination • In regions of low recombination • deleterious mutations can hitch-hike to high frequency along with favorable ones • favorable mutations are kept at low frequency by linkage to deleterious ones • Therefore, the effectiveness of natural selection is causally related to recombination rate • Are clusters more concentrated in regions of • high recombination (i.e. are they adaptive) • low (i.e. are they maladaptive)?

  26. Measuring recombination rate Chromosome 1

  27. Co-expression is greater in low recombination regions

  28. Co-expression clusters • MPSS data provides evidence for clusters of co-expression among non-related genes in Arabidopsis • Co-expression is greater in regions of low recombination • Thus, co-expression clusters may be maladapative, at least on average

  29. Outline • Arabidopsis gene expression (MPSS) • Two evolutionary issues in the evolution of expression profiles: • Physical clustering of co-expressed genes • Divergence of duplicated genes

  30. Divergence of duplicated genes Expression distance Age of duplication

  31. Duplicated genes in Arabidopsis

  32. Modes of gene duplication • Tandem (unequal crossing-over) • Dispersed (transposition) • Segmental (polyploidy)

  33. Divergence of duplicated genes • All gene families of size 2 in Arabidopsis were classified as ‘dispersed’, ‘segmental’ or ‘tandem’ • Expression distance was calculated for each • The number of silent (i.e. synonymous) substitutions per site was calculated for each (as a proxy for age since duplication)

  34. Divergence and mode of duplication

  35. Divergence of duplicated genes • Almost all expression divergence occurs during (or immediately following) duplication • Initial expression divergence is more extreme for tandem than dispersed duplicates • Tandem and dispersed duplicates with the most divergent expression profiles are quickly lost • Segmental duplicates plateau at a lower level of expression divergence than dispersed duplicates • The average divergence in relative expression level in each tissue is about 8-fold.

  36. Lessons learned • Clusters of co-expression in Arabidopsis may be largely the result of a rain of weakly deleterious mutations that homogenize the expression profiles of neighboring genes • Divergence in expression profile between duplicated genes is dependent on the nature of the mutation that gave rise to the duplication

  37. Thanks! • UNC Chapel Hill • Jianhua Hu • University of Delaware • Blake Meyers • NSF Plant Genome Research Program • DBI-01103267 (TJV) • DBI-0110528 (BCM)

More Related