1 / 66

Evaluation of a New Tool for Association Mapping Structure

This document discusses the evaluation of a new software tool (Structure 2.0) for association mapping in population genetics. It covers the principles, methods, and advantages of using Bayesian inference and Markov Chain Monte Carlo (MCMC) techniques for population structure analysis. The text also explores strategies for grouping individuals and detecting admixture complications. Moreover, it presents detailed information on running the MCMC program and assessing parameters for population genetics studies.

ridleyj
Download Presentation

Evaluation of a New Tool for Association Mapping Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of a new tool for usein association mapping Structure Reinhard Simon, 2002/10/29

  2. Software Structure 2.0http://pritch.bsd.uchicago.eduPritchard JK, Stephens M, Donelly P (2000):Inference of population structure using multilocus genotype data.Genetics, 155: 945-959

  3. Associations – the ideal Cases Controls

  4. Test for association A diploid locus: Pearsons Chi-square test

  5. Example: Contingency table

  6. Associations – the less ideal Cases Controls

  7. Associations – simple admixture Cases Controls

  8. Associations – admixture complications Cases Controls

  9. Associations – admixture complications Cases Controls High frequency of associated loci may indicate problems with underlying population structure (=stratification).

  10. Associations – accounted for Cases Controls

  11. Questions • Is there a stratification? • If so: - how many subpopulations - which individual belongs to which subpopulation

  12. Test for stratification - principle Summarizing over all loci: • Xi is Chi-square at i-th locus • Null hypothesis: no differences between allele frequencies over all loci • df equal to sum of df at individual locus Pritchard: 1999

  13. Test for stratification – ctd. Observations: • strong positive selection requires increase of #loci • subgroup specific markers decrease number of necessary loci Pritchard: 1999

  14. How to group individuals? • Based on distance measures • Based on models

  15. Pair wise distance measures Jaccard Nei & Li Sokal & Michener

  16. Model based Bayesian inference • Bayesean statistics: Uncertainty is modeled using probabilities • probability statements are made about model parameters Advantages: • very general framework • assumptions are made explicit and are quantified

  17. Bayesian inference – how? • Bayesian inference centers on the posterior distribution p(theta|X), e.g.a genetic model of the distribution of allele frequencies • However, analytic evaluation is seldom possible ....

  18. Bayesian inference - methods Alternatives: • Numerical evaluation • approximation • simulation, e.g. Markov Chain Monte Carlo Methods

  19. Simulation methods for Bayesian inference - general • Generate random samples from a probability distribution (e.g. normal) • Construct histogram • If sample is large enough, this allows to calculate mean, variance, ... • MCMC allows to generate large samples from any probability distribution

  20. Markov Chain behaviour • Reaches an equilibrium (basic MCMC theorem) and • the present state depends only on the preceding: “The future depends on the past only through the present.”

  21. MCMC - strengths • freedom in inference (e.g. simultaneous estimation, estimation of arbitrary functions of model parameters like ranks or threshold exceedence) • Coherently integrates uncertainty • Only available method for complex problems

  22. MCMC – contra • computational intensive • requires often specialized software

  23. Inferring population structure X = genotypes of sampled invidualsunknown:Z = population of originP = allele frequencies in all populationsQ = proportion of genome that originates from population k Pr(Z, P, Q|X) ~ Pr(Z) * Pr(P) * Pr(Q) * Pr(X|Z,P,Q) Solution:Using MCMC for Bayesian inference;simultaneous estimation of Q, Z and P.

  24. Basic MCMC algorithm – no admixture (Q) Initialize:Random values for Z (pop), e.g. from Pr(z) = 1/k Repeat for m=1,2,...1. Sample P(m) from Pr(P|X, Z(m-1) (estimate allele frequencies) 2. Sample Z(m) from Pr(Z|X, P(m)) (estimate population of origin for each indiv.)

  25. Basic MCMC algorithm – with admixture (Q) Initialize:Random values for Z (pop), e.g. from Pr(z) = 1/k Repeat for m=1,2,...1. Sample P(m), Q(m) from Pr(P, Q|X, Z(m-1) (estimate allele frequencies) 2. Sample Z(m) from Pr(Z|X, P(m), Q(m)) 3. Update alpha (admixture proportion)

  26. Program – parameters: MCMC

  27. Program – parameters: Q

  28. Program – parameters: P

  29. Program – parameters: Z, K

  30. Program – data types • marker: SNP, microsatellites AFLP, RFLP, ... (biallelic) • ploidy: >1 • extra optional information for inclusion: • prior knowledge on groups (e.g. geographic location) • genetic map location of marker

  31. Program – data format

  32. Example – S.t. tuberosum vs andigena Other:1st 30 genotypes from tuberosum 2nd 20 genotypes from andigena

  33. Example – S.t. tuberosum vs andigena PNA:

  34. Example – S.t. tuberosum vs andigena PNA: Estimation of k Simulation # k Pr(k)

  35. Example – S.t. tuberosum vs andigena PNA: assignment 1 = tbr; 2 = adggenotypes #31-#3: adg from Indiagenotype #49: adg from Ecuador

  36. Example – S.t. tuberosum vs andigena Parameter change: allow admixture Ancestry Model Info Use Admixture Model * Infer Alpha * Initial Value of ALPHA (Dirichlet Parameter for Degree of Admixture): 1.0 * Use Same Alpha for all Populations * Use a Uniform Prior for Alpha ** Maximum Value for Alpha: 10.0 ** SD of Proposal for Updating Alpha: 0.025Frequency Model Info Allele Frequencies are Independent among Pops * Infer LAMBDA ** Use a Uniform Lambda for All Population ** Initial Value of Lambda: 1.0

  37. Example – S.t. tuberosum vs andigena Parameter change: allow admixture

  38. Example – S.t. tuberosum vs andigena Parameter change: allow admixture

  39. Example – S.t. tuberosum vs andigena Parameter change: allow admixture

  40. Example – andigena

  41. Example – andigena: data

  42. Example – andigena K = 2

  43. Example – andigena K = 3

  44. Example – andigena K = 3

  45. Example – andigena: genetic distance K = 3

  46. Example – andigena: geographic distribution - 1 K = 3

  47. Example – andigena: geographic distribution - 2 K = 3

  48. Example – andigena: geographic distribution - 3 K = 3

  49. Example – I. batatas

  50. Example – I. batatas: settings

More Related