570 likes | 804 Views
Putting gene family evolution in its chromosomal context. Todd Vision Department of Biology University of North Carolina at Chapel Hill. Outline. Gene order rearrangement in plants Chromosomal perspective Gene family perspective Gene duplication and functional divergence
E N D
Putting gene family evolution in its chromosomal context Todd Vision Department of Biology University of North Carolina at Chapel Hill
Outline • Gene order rearrangement in plants • Chromosomal perspective • Gene family perspective • Gene duplication and functional divergence • Segmental duplications as a tool
Chromosomal perspective • Biological importance • Clustering of gene function • Clustering of transcriptional activity • Applied importance • Conservation of gene order (synteny)
Arabidopsis as a hub for plant comparative maps Arumuganathan and Earle 1991 Plant Mol Biol Rep 9, 208.
Arabidopsis paleopolyploidy The Arabidopsis Genome Initiative 2000 Nature 408, 796
Tomato-Arabidopsis synteny Bancroft 2001 TIG 17, 89 after Ku et al. 2000 PNAS 97, 9121.
Rice-Arabidopsis microsynteny Mayer et al. 2001 Genome Res. 11, 1167.
Hidden syntenies Simillion et al. 2002 PNAS 99, 13627.
Interspecies comparison can reveal hidden syntenies Vandepoele et al. 2002 TIG 18, 606.
From descriptive to predictive • Can we predict the gene content of homologous segments when markers are sparse? • Utility for QTL mapping • Prioritize candidate genes in a QTL region from a non-sequenced genome • Provide markers for fine-mapping
Hidden Markov Models (HMM) t1,1 t1,2 t2,2 t2,end Transition probabilities Hidden states Emission probabilities 1 2 end p1(a) p1(b) p2(a) p2(b) Observed states: a->b->a Hidden states: 1->1->2->end Probability: p1(a)t1,1 p1(b)t1,2 p2(a) t2,end
A gene content HMM • Observed states • a homologous gene is either observed or not • Hidden states • presence or absence of gene within a segment • Emission probabilities • A gene will be unobserved if it is not present • A gene may be unobserved even if it is present • Dependent on the density of the gene map • Transition probabilities • reflect conservation of gene content along the branches of a phylogeny
1-a 1 Loss (L) Loss-Gain (LG) Multiple Loss-Gain (MLG) a P A 1-b 1-a 1 b a A1 P A2 1-b 1-ai 1 b ai A1 P A2
Estimating model parameters • Segment phylogeny • Each set of homologous genes is missing from some segments • Estiimate an “averaged” distance matrix • Build tree with neighbor-joining and midpoint rooting • HMM parameter estimation • Loss rate(s) • Gain rate • Number of genes present at the root
Do parameter estimates converge? LG model n=100 genes no missing data a1 = 0.1, a2 = 0.3 1000 replicates
Accuracy of hidden state assignments 5 segment phylogeny, a= 1=0.1, 2=0.3, =0.1, 24% gain
A large multiplicon 12 segments from rice and arabidopsis 56 sets of homologous genes Vandepoele et al 2003 Plant Cell 15, 2192.
Self-validation test ? ? ? ? ?
Probability of gene presence(8 longest segments) Branch lengths scaled so that longest branch is 1.0 Estimate of a = 0.7
Summary: gene content HMM • Multispecies comparative maps • Becoming more common • Most species only partially characterized • Usefulness also compromised by sparse synteny • Probabilistic models will allow us to move • from simple descriptions of the extent of synteny • to predictive tools that can guide further experiments
Gene family perspective T • Modes of duplication • Tandem (T) • Dispersed (D) • Segmental (S) D S
A tale of two sisters: the ARF and the Aux/IAA gene families • Modulate whole plant response to auxin • Interact via dimerization • ARFs are transcription factors • Aux/IAAs bind and repress ARFs in the absence of auxin
Diversification of ARFs Remington et al 2004 Plant Cell 135, 1738
The chromosomal context Remington et al 2004 Plant Cell 135, 1738
Diversification of the Aux/IAAs Remington et al 2004 Plant Cell 135, 1738
Why the different patterns of diversification? • 12% (ARF) vs 40% (Aux/IAA) segmental duplications • Presumably reflects differential retention • Possible explanations • Dosage requirements • Coevolution with other interacting genes • Regional transcriptional regulation
How typical is the Aux/IAA family? Cannon et al. 2004 BMC Plant Biology 4, 10.
Segmental duplication of pathways? Blanc and Wolfe 2004 Plant Cell 16, 1679.
Summary: gene family perspective • Chromosomal context can matter • Gene families differ in their patterns of duplicate gene proliferation • Presumably due to differential retention • Polyploidy • Qualitatively differs from other gene duplication modes • Divergence of whole pathways possible
Functional divergence and chromosomal context Do patterns of divergence (ie spatiotemporal expression) differ among T, D, and S duplicates?
Duplicate pairs in yeast and human (Gu et al. 2002, Makova and Li 2003) • Appx. 50% of pairs diverge very rapidly • Proportion of divergent pairs increases with synonymous substitions (Ks) • Less so withreplacement changes (Ka) • Plateaus at Ka ~0.3 in human • In humans, distantly related pairs with conserved expression tend to be either ubiquitous or very tissue specific
Digital expression profiling • Massively Parallel Signature Sequencing (MPSS) • Count occurrence of 17-20 bp mRNA signatures • Cloning and sequencing is done on microbeads • Similar to Serial Analysis of Gene Expression (SAGE) • “Bar-code” counting reduces concerns of • cross-hybridization • probe affinity • background hybridization • Which enables • Accurate counts of low expression genes • Distinguishing expression profiles of duplicate genes
MPSS technology Clone 3’ ends of transcripts to microbeads Sort by FACS and deposit in channeled monolayer Sequence 17-20 bp from 5’ end by hybridization Brenner et al. 2000 PNAS 97:1665.
MPSS Data signature frequency GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCTTTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCAAGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTACCAGAACTCGG . . GATCGGACCGATCGACT 2 53 212 349 417 561 672 702 814 . . 2,935 Total # of tags: >1,000,000
Duplicated: expression may be from other site in genome Potential alternative splicing or nested gene Potential alternative termination Anti-sense transcript or nested gene? Potential anti-sense transcript Potential un-annotated ORF Triangles refer to colors used on our web page: Class 1 - in an exon, same strand as ORF. Class 2 - within 500 bp after stop codon, same strand as ORF. Class 3 - anti-sense of ORF (like Class 1, but on opposite strand). Class 4 - in genome but NOT class 1, 2, 3, 5 or 6. Class 5 - entirely within intron, same strand. Class 6 - entirely within intron, anti-sense. Grey = potential signature NOT expressed Class 0 - signatures found in the expression libraries but not the genome. or or or or or or Classifying signatures Typical signatures
Core Arabidopsis MPSS librariessequenced by Lynx for Blake Meyers, U. of Delaware Signatures Distinct Library sequenced signatures Root 3,645,414 48,102 Shoot 2,885,229 53,396 Flower 1,791,460 37,754 Callus 1,963,474 40,903 Silique 2,018,785 38,503 TOTAL 12,304,362 133,377
http://www.dbi.udel.edu/mpss • Query by • Sequence • Arabidopsis gene identifier • chromosomal position • BAC clone ID • MPSS signature • Library comparison • Site includes • Library and tissue information • FAQs and help pages
Chr. I Chr. II Chr. III Chr. IV Chr. V Genome-wide MPSS profile in Arabidopsis Of the 29,084 gene models, 17,849 match unambiguous, expressed class 1 and/or 2 signatures
Dataset of duplicate pairs • Arabidopsis gene families of size 2 classified as • Dispersed (280) • Segmental (149) • Tandem (63) • For each pair • Measured similarity/distance in expression profile • Estimated silent Ks and replacement KA changes
library 2 library 1 library 3 Expression distance
Major findings • Many pairs are divergent in sequence but not expression and vice versa • Pairs have atypically high expression • Especially slowly evolving pairs • Divergence increases with Ka, • Particularly among S duplicates! • Divergence tends to be highly asymmetric