230 likes | 370 Views
The Dynamics of Positive Selection on the Mammalian Tree. Carolin Kosiol Cornell University < ck285@cornell.edu >. Joint with: Tomas Vinar, Rute Da Fonseca, Melissa Hubisz, Carlos Bustamante, Rasmus Nielsen and Adam Siepel. human. chimp. macaque. mouse. rat. dog. 0.05 subst/site.
E N D
The Dynamics of Positive Selection on the Mammalian Tree Carolin Kosiol Cornell University <ck285@cornell.edu> Joint with: Tomas Vinar, Rute Da Fonseca, Melissa Hubisz, Carlos Bustamante, Rasmus Nielsen and Adam Siepel
human chimp macaque mouse rat dog 0.05 subst/site Positive selection in six mammalian genomes 6 high-quality genomes of eutherian mammals 16529 human / chimp / macaque / mouse / rat / dog orthologous genes. 544 genes identified to be under positive selection using codon models.
Codon models 0 i,j differ by > 1 nucleotide ji, j synonymous transversion j i, j synonymous transition j i, j nonsynonymous transversion j i, j nonsynonymous transition Qij= < 1purifying selection • =1neutral evolution • > 1positive selection where : transition/transversion rate ratio j : equilibrium frequency of codon j : nonsynonymous/synonymous rate ratio (Goldman &Yang 1994,Yang et al. , 2000)
Branch-Site LikelihoodRatio Tests (LRTs) • Based on continuous-time Markov models of codon evolution • Compare null model allowing for negative selection (ω<1) or neutral evolution (ω=1) with alternative model additionally allowing for positive selection (ω>1) • Both models allow ω to vary across sites • Can have foreground branches with PS and background branches without • Applied separately to each gene (Nielsen & Yang, 1998; Yang & Nielsen, 2002)
400 human chimp hominid macaque 10 18 7 10 rodent branch rodent clade primate branch primate clade 56 61 21 24 Branch and clade LRTs Total: 544 positively selected genes (PSGs) identified
Co-evolution in complement immunity P<0.05 FDR<0.05
29-1 = 511 possible selection histories on the 9 branch mammalian phylogeny
Why Baysian Model Selection? • Many of the likelihoods of the 511 models might be very similar or identical. • Models are not nested. • Bayesian analysis looks at distribution of selection histories. • Bayesian analysis allows “soft” (probabilistic) choices of selection histories. • We can compute prevalence of selection on individual branches and clades that considers uncertainty of selection histories.
Bayesian Switching Model • Two evolutionary modes: Selected Non-selected • Parameters describing the switching process: b,G : probability that gene gains positive selection on branch b b,L : probability that gene loses positive selection on branch b
Bayesian Switching Model X =(X1, …XN) be the alignment data, with Xi alignment of ith gene Z=(Z1,…,ZN) be the set of selection histories, with Zidenoting history of ith gene. is set of switching parameters Assume independence of genes X and histories Z, and conditional independence X and given Z. Thus,
(1,1) (1,1) (1,1) (1,1) (0,0) (0,1) (1,1) Mapping selection histories to switches (cont.) Gain of pos. selection (0,1) : nbG Absence of gain of pos. selection (0,0) : 1- nbG Loss of pos. selection (0,1) : nbL Absence of loss pos. selection (1,1) : 1- nbL
Putting everything together … with (Beta distrib =1, =9) (Likelihoods from codon models assuming selection histories Zj) (Product relevant switching prob)
Gibbs sampling Variables Z and are unobserved. We sample from the joint posterior distribution by a Gibbs sampler that alternates between sampling each Zi conditional on Xi and previously sampled and sampling conditional on a previously sampled Z.
Inferred Rates of Gain and Loss gain loss
Episodic selection on the mammalian tree • Most genes appear to have switched between evolutionary modes multiple times. • Posterior expected number of modes switches 1.6 (0.6 gains, 1.0 loses) • An expected 95% of PSGs have experienced at least once, 53% at least twice. • These observations are qualitatively in agreement with Gillespie’s episodic molecular clock.
Inferred Number of Genes Under Positive Selection (119-162) (183-232) (32-62) (234 -327) (219-257) (338-382) (318-360) (255-325) (357-426) (204-278) (213-292) (281-333)
Complement components C7 and C8B • Components C7 and C8B encode proteases in the membrane attack complex • Differences in complement proteases are thought to explain certain differences in immune responses of humans and rodents. C7: PP=0.98 C8B: PP=0.93 (Puente et al, 2003)
Glycoprotein hormones GGA • CGA is alpha subunit of chorionic gonadotropin, luteinizing hormone, follicle stimulating, and thyroid stimulating hormone. • The alpha subunits of 4 hormones are identical, however, their beta chains are unique and confer biological specificity. • Beta subunits CGB1 and CGB2 are thought to have originated from gene duplication in the common ancestor of humans and great apes. PP = 0.82
Summary and Future Work • Bayesian analysis allows the study of patterns and the episodic nature of positive selection on the mammalian tree. • Most probable selection histories can be identified for individual genes. • Ideally, we like to model mode switches in continuous time. • Compare functions of genes with high and low expected number of switches. • Is the selection history predictive of function?
Resource http://compgen.bscb.cornell.edu/projects/mammal-psg/
Thanks Siepel Lab (Cornell) Adam Siepel, Tomas Vinar, Brona Brejova, Adam Diehl, Andre Luis Martins Bustamante Lab (Cornell) Carlos Bustamante,Adam Boyko, Adam Auton, Keyan Zhao, Abra Brisbin, Kasia Bryc, Jeremiah Degenhardt, Lin Li, Kirk Lohmueller, Weisha Michelle Zhu, Amit Indap Nielsen lab (Berkeley) Rasmus Nielsen Rute Da Fonseca NIH and NSF for funding