1 / 56

Putting gene family evolution in its chromosomal context

Putting gene family evolution in its chromosomal context. Todd Vision Department of Biology University of North Carolina at Chapel Hill. Outline. Gene order rearrangement in plants Chromosomal perspective Gene family perspective Gene duplication and functional divergence

alida
Download Presentation

Putting gene family evolution in its chromosomal context

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Putting gene family evolution in its chromosomal context Todd Vision Department of Biology University of North Carolina at Chapel Hill

  2. Outline • Gene order rearrangement in plants • Chromosomal perspective • Gene family perspective • Gene duplication and functional divergence • Segmental duplications as a tool

  3. Chromosomal perspective • Biological importance • Clustering of gene function • Clustering of transcriptional activity • Applied importance • Conservation of gene order (synteny)

  4. Devos and Gale 2000 Plant Cell 12, 637

  5. Arabidopsis as a hub for plant comparative maps Arumuganathan and Earle 1991 Plant Mol Biol Rep 9, 208.

  6. Arabidopsis paleopolyploidy The Arabidopsis Genome Initiative 2000 Nature 408, 796

  7. Non-overlapping syntenies

  8. Blanc et al. 2003 Genome Res. 13, 137.

  9. Blanc and Wolfe 2004 Plant Cell 16, 1667.

  10. Tomato-Arabidopsis synteny Bancroft 2001 TIG 17, 89 after Ku et al. 2000 PNAS 97, 9121.

  11. Rice-Arabidopsis microsynteny Mayer et al. 2001 Genome Res. 11, 1167.

  12. Hidden syntenies Simillion et al. 2002 PNAS 99, 13627.

  13. Interspecies comparison can reveal hidden syntenies Vandepoele et al. 2002 TIG 18, 606.

  14. Simillion et al. 2004 Genome Res. 14, 1095

  15. From descriptive to predictive • Can we predict the gene content of homologous segments when markers are sparse? • Utility for QTL mapping • Prioritize candidate genes in a QTL region from a non-sequenced genome • Provide markers for fine-mapping

  16. Hidden Markov Models (HMM) t1,1 t1,2 t2,2 t2,end Transition probabilities Hidden states Emission probabilities 1 2 end p1(a) p1(b) p2(a) p2(b) Observed states: a->b->a Hidden states: 1->1->2->end Probability: p1(a)t1,1 p1(b)t1,2 p2(a) t2,end

  17. A gene content HMM • Observed states • a homologous gene is either observed or not • Hidden states • presence or absence of gene within a segment • Emission probabilities • A gene will be unobserved if it is not present • A gene may be unobserved even if it is present • Dependent on the density of the gene map • Transition probabilities • reflect conservation of gene content along the branches of a phylogeny

  18. Transition probabilities and the segment phylogeny

  19. 1-a 1 Loss (L) Loss-Gain (LG) Multiple Loss-Gain (MLG) a P A 1-b 1-a 1 b a A1 P A2 1-b 1-ai 1 b ai A1 P A2

  20. Estimating model parameters • Segment phylogeny • Each set of homologous genes is missing from some segments • Estiimate an “averaged” distance matrix • Build tree with neighbor-joining and midpoint rooting • HMM parameter estimation • Loss rate(s) • Gain rate • Number of genes present at the root

  21. Do parameter estimates converge? LG model n=100 genes no missing data a1 = 0.1, a2 = 0.3 1000 replicates

  22. Accuracy of hidden state assignments 5 segment phylogeny, a=  1=0.1, 2=0.3, =0.1, 24% gain

  23. A large multiplicon 12 segments from rice and arabidopsis 56 sets of homologous genes Vandepoele et al 2003 Plant Cell 15, 2192.

  24. Self-validation test ? ? ? ? ?

  25. Probability of gene presence(8 longest segments) Branch lengths scaled so that longest branch is 1.0 Estimate of a = 0.7

  26. Summary: gene content HMM • Multispecies comparative maps • Becoming more common • Most species only partially characterized • Usefulness also compromised by sparse synteny • Probabilistic models will allow us to move • from simple descriptions of the extent of synteny • to predictive tools that can guide further experiments

  27. Gene family perspective T • Modes of duplication • Tandem (T) • Dispersed (D) • Segmental (S) D S

  28. A tale of two sisters: the ARF and the Aux/IAA gene families • Modulate whole plant response to auxin • Interact via dimerization • ARFs are transcription factors • Aux/IAAs bind and repress ARFs in the absence of auxin

  29. Diversification of ARFs Remington et al 2004 Plant Cell 135, 1738

  30. The chromosomal context Remington et al 2004 Plant Cell 135, 1738

  31. Diversification of the Aux/IAAs Remington et al 2004 Plant Cell 135, 1738

  32. Remington et al 2004 Plant Cell 135, 1738

  33. Why the different patterns of diversification? • 12% (ARF) vs 40% (Aux/IAA) segmental duplications • Presumably reflects differential retention • Possible explanations • Dosage requirements • Coevolution with other interacting genes • Regional transcriptional regulation

  34. How typical is the Aux/IAA family? Cannon et al. 2004 BMC Plant Biology 4, 10.

  35. Segmental duplication of pathways? Blanc and Wolfe 2004 Plant Cell 16, 1679.

  36. Summary: gene family perspective • Chromosomal context can matter • Gene families differ in their patterns of duplicate gene proliferation • Presumably due to differential retention • Polyploidy • Qualitatively differs from other gene duplication modes • Divergence of whole pathways possible

  37. Functional divergence and chromosomal context Do patterns of divergence (ie spatiotemporal expression) differ among T, D, and S duplicates?

  38. Duplicate pairs in yeast and human (Gu et al. 2002, Makova and Li 2003) • Appx. 50% of pairs diverge very rapidly • Proportion of divergent pairs increases with synonymous substitions (Ks) • Less so withreplacement changes (Ka) • Plateaus at Ka ~0.3 in human • In humans, distantly related pairs with conserved expression tend to be either ubiquitous or very tissue specific

  39. Digital expression profiling • Massively Parallel Signature Sequencing (MPSS) • Count occurrence of 17-20 bp mRNA signatures • Cloning and sequencing is done on microbeads • Similar to Serial Analysis of Gene Expression (SAGE) • “Bar-code” counting reduces concerns of • cross-hybridization • probe affinity • background hybridization • Which enables • Accurate counts of low expression genes • Distinguishing expression profiles of duplicate genes

  40. MPSS technology Clone 3’ ends of transcripts to microbeads Sort by FACS and deposit in channeled monolayer Sequence 17-20 bp from 5’ end by hybridization Brenner et al. 2000 PNAS 97:1665.

  41. MPSS Data signature frequency GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCTTTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCAAGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTACCAGAACTCGG . . GATCGGACCGATCGACT 2 53 212 349 417 561 672 702 814 . . 2,935 Total # of tags: >1,000,000

  42. Duplicated: expression may be from other site in genome Potential alternative splicing or nested gene Potential alternative termination Anti-sense transcript or nested gene? Potential anti-sense transcript Potential un-annotated ORF Triangles refer to colors used on our web page: Class 1 - in an exon, same strand as ORF. Class 2 - within 500 bp after stop codon, same strand as ORF. Class 3 - anti-sense of ORF (like Class 1, but on opposite strand). Class 4 - in genome but NOT class 1, 2, 3, 5 or 6. Class 5 - entirely within intron, same strand. Class 6 - entirely within intron, anti-sense. Grey = potential signature NOT expressed Class 0 - signatures found in the expression libraries but not the genome. or or or or or or Classifying signatures Typical signatures

  43. Core Arabidopsis MPSS librariessequenced by Lynx for Blake Meyers, U. of Delaware Signatures Distinct Library sequenced signatures Root 3,645,414 48,102 Shoot 2,885,229 53,396 Flower 1,791,460 37,754 Callus 1,963,474 40,903 Silique 2,018,785 38,503 TOTAL 12,304,362 133,377

  44. http://www.dbi.udel.edu/mpss • Query by • Sequence • Arabidopsis gene identifier • chromosomal position • BAC clone ID • MPSS signature • Library comparison • Site includes • Library and tissue information • FAQs and help pages

  45. Chr. I Chr. II Chr. III Chr. IV Chr. V Genome-wide MPSS profile in Arabidopsis Of the 29,084 gene models, 17,849 match unambiguous, expressed class 1 and/or 2 signatures

  46. Dataset of duplicate pairs • Arabidopsis gene families of size 2 classified as • Dispersed (280) • Segmental (149) • Tandem (63) • For each pair • Measured similarity/distance in expression profile • Estimated silent Ks and replacement KA changes

  47. library 2 library 1 library 3 Expression distance

  48. Major findings • Many pairs are divergent in sequence but not expression and vice versa • Pairs have atypically high expression • Especially slowly evolving pairs • Divergence increases with Ka, • Particularly among S duplicates! • Divergence tends to be highly asymmetric

More Related