1 / 12

John Marshall 1 , Professor Robert Weiss 2

John Marshall 1 , Professor Robert Weiss 2. A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations. 1 Department of Biomathematics, UCLA School of Medicine, Los Angeles CA 90095-1766 USA

benita
Download Presentation

John Marshall 1 , Professor Robert Weiss 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. John Marshall1, Professor Robert Weiss2 A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations 1Department of Biomathematics, UCLA School of Medicine, Los Angeles CA 90095-1766 USA 2Department of Biostatistics, UCLA School of Public Health, Los Angeles CA 90095-1772 USA

  2. Using microsatellite alleles to detect recent selective sweeps Microsatellites: • Tandem repeats of short DNA segments typically 1-5 bp in length • Alleles defined by number of repeats at a particular locus • Multiallelic → highly informative markers Factors affecting variance in microsatellite allele size: • Locus specific: • Microsatellite mutation rate (mainly due to ‘slippage’ during DNA replication) • Population specific: • Effective population size • Population-level events (migration, bottlenecks) • Population and locus specific: • Hitchhiking of a microsatellite allele to a selected gene

  3. The lnRV statistic • From population genetics, variance in microsatellite allele size at a given locus (j) in a given population (i) is a function of effective population size (Nei) and microsatellite mutation rate (j) • Taking the ratio of expected variances in microsatellite allele sizes for a pair of populations (i1 and i2) thus removes the locus-dependence • For a pair of populations (i1 and i2) the ratio of variances for a set of loci (j=1,2,…,T) can be calculated • Using coalescent simulations, the lnRV values have empirically been shown to follow a normal distribution. • A microsatellite near to a selected locus is expected to have reduced variance and hence to have an lnRV value that is an outlier from the otherwise normal distribution of lnRV values

  4. Pros and cons of the lnRV statistic CONS: • Much information is lost when a set of allele size data at a particular locus for all individuals in a population is reduced to a single value • Only makes pair-wise comparisons • Difficult to extrapolate methodology to >2 populations • Inferences from pairs of populations are not carried over to other populations • Masking can occur when multiple outliers expand the confidence interval and lead to none or only a subset of outliers being detected PROS: • Easy and fast to calculate • Intuitive to understand • Can cope with a very large number of loci • Not sensitive to genetic drift, migration or inbreeding since these processes affect all loci to the same extent and so are removed in the ratio calculation

  5. The Bayesian model Distribution of microsatellite allele sizes: Mean components: Variance components: (i indexes population, j indexes locus, k indexes individual)

  6. Consistency between lnRV statistic andBayesian ANOVA Bayesian ANOVA: lnRV statistic: Relative selection:

  7. Bayesian statistics for detecting selective sweeps For a given locus j, the population with the smallest fractional reduction in allele size variance is denoted imax and has this corresponding variance component. Relative selection at locus j can be measured relative to population imax, e.g.: • Here BnM has the largest  value so is least selected • BnB and SeB have the smallest  values so are most selected • The extent of selection can be measured by: • And:

  8. Pros and cons of Bayesian approach PROS: • Doesn’t shrink data down to summary statistics before analysis • Can be used to compare >2 populations at once • Inferences from one population are carried over to all others • Can cope with any number of selected loci without shielding occurring • Supplies quantitative measures of the probability that selection has occurred • Can cope well with tiny sample sizes CONS: • Can take a long time to converge • Sometimes requires a lot of computer power • Bayesian methods are more difficult to implement • Require well-specified prior distributions • Require programming, use of complicated software • Inferences are slightly determined by subjective choice of prior distributions

  9. Microsatellite data for West African Anopholes gambiae populations • 1998 data set: • Allele size data collected at 21 microsatellite loci dispersed throughout Anopholes gambiae • 5 subpopulations: • Bamako chromosomal form in villages of Banambani and Selinkenyi • Mopti chromosomal form in villages of Banambani and Selinkenyi • Savannah chromosomal form in village of Banambani • 2003 data set: • Microsatellite allele size data collected at 12 microsatellite loci dispersed throughout Anopholes gambiae chromosome 3 • Data taken for 12 subpopulations • Mopti chromosomal form in the villages of Oure, Dire, Kondi, Nampala, Torkya and Banikane • Savannah chromosomal form in the villages of Oure, Gono, Kokouna, Pimperena, Soulouba and Madina Diasra

  10. Loci likely targeted by recent selective sweeps (1998 data set) Applying the Bayesian ANOVA model to the 1998 data set, there is evidence of selection (in order of magnitude) in: | 025 637 | 637 / Locus 637:

  11. Loci likely targeted by recent selective sweeps (2003 data set) Applying the Bayesian ANOVA model to the 2003 data set, there is evidence of selection (in order of magnitude) in: 119 | Locus 119:

  12. Implications for recent selection in Anopholes gambiae genome 1998 data set: • Strongest evidence for selection is for: • locus 637 (chromosome 2) in Bamako form • locus 038 (X chromosome) in Savannah form • Most selected loci are on chromosome 2 • For a given chromosomal form collected at Banambani and Selenkenyi, selection seems to be evident in both locations • The same does not apply for a given location where multiple chromosomal forms are collected • Suggests there is more gene flow between these two villages than there is between chromosomal forms 2003 data set: • Strongest evidence for selection is for: • locus 119 (chromosome 3R) in Mopti form in Oure • Locus 127 (chromosome 3R) in Savannah form in Oure • Selected loci are dispersed throughout chromosome 3 (only chromosome 3 loci were analyzed in this data set) • This time there is very little correlation for given chromosomal forms collected at neighbouring locations • Possibly selection on chromosome 3 is weaker (1998 data set showed no selection on chromosome 3) -093 119 | -577 059-

More Related