1 / 49

CSE280b: Population Genetics

Explore the concepts of population genetics, Hardy-Weinberg equilibrium, and linkage disequilibrium in disease gene mapping. Learn about simulating population data and the Wright-Fisher model of evolution. Understand the coalescent theory and its application in analyzing genetic data.

Download Presentation

CSE280b: Population Genetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner www.cse.ucsd.edu/classes/sp05/cse291 Vineet Bafna

  2. Review • Hardy Weinberg Equilibrium • Linkage Equlibrium Vineet Bafna

  3. Recombination and Linkage Equilibrium • In a freely mixing population, an individual chromosome randomly chooses its parent from the available pool • With unimpeded recombination (Linkage Equilibrium), the individual freely chooses its two parent chromsomes, and then freely chooses alleles from the parents • What is the probability of seeing the allele <1 1>? 0 1 0 1 0 1 0 1 1 0 1 1 1 0 0 0 Vineet Bafna

  4. Measures of LD • Consider two bi-allelic sites with alleles marked with 0 and 1 • Define • P00 = Pr[Allele 0 in locus 1, and 0 in locus 2] • P0* = Pr[Allele 0 in locus 1] • Linkage equilibrium if P00 = P0* P*0 • D = abs(P00 - P0* P*0) = abs(P01 - P0* P*1) = … Vineet Bafna

  5. LD over time • With random mating, and fixed recombination rate r between the sites, Linkage Disequilibrium will disappear over time • Let D(t) = LD at time t • P(t)00 = (1-r) P(t-1)00 + r P(t-1)0* P(t-1)*0 • D(t) =P(t)00 - P(t)0* P(t)*0 = P(t)00 - P(t-1)0* P(t-1)*0 • (HW) • D(t) =(1-r) D(t-1) =(1-r)t D(0) Vineet Bafna

  6. LD over distance • Assumption • Recombination rate increases linearly with distance • Let r be the (constant) recombination rate per bp per generation D(t) = P(t)00- P0* P*0 = (1-r)d P(t-1)00+ [1- (1-r)d ] P0* P*0 - P0* P*0 = (1-r)d D(t-1) • The assumption is reasonable, but recombination rates vary from region to region, adding to complexity Vineet Bafna

  7. LD and disease mapping • Consider a mutation that is causal for a disease. • The goal of disease gene mapping is to discover which gene (locus) carries the mutation. • Consider every polymorphism, and check: • There might be too many polymorphisms • Multiple mutations (even at a single locus) that lead to the same disease • Instead, consider a dense sample of polymorphisms that span the genome Vineet Bafna

  8. LD can be used to map disease genes • LD decays with distance from the disease allele. • By plotting LD, one can short list the region containing the disease gene. LD D N N D D N 0 1 1 0 0 1 Vineet Bafna

  9. LD and disease gene mapping problems • Marker density? • Complex diseases • Population sub-structure Vineet Bafna

  10. Population Genetics • Often we look at these equilibria (Linkage/HW) and their deviations in specific populations • These deviations offer insight into evolution. • However, what is Normal? • A combination of empirical (simulation) and theoretical insight helps distinguish between expected and unexpected. Vineet Bafna

  11. Topic 2: Simulating population data • We described various population genetic concepts (HW, LD), and their applicability • The values of these parameters depend critically upon the population assumptions. • What if we do not have infinite populations • No random mating (Ex: geographic isolation) • Sudden growth • Bottlenecks • Ad-mixture • It would be nice to have a simulation of such a population to test various ideas. How would you do this simulation? Vineet Bafna

  12. Wright Fisher Model of Evolution • Fixed population size from generation to generation • Random mating Vineet Bafna

  13. Coalescent model • Insight 1: • Separate the genealogy from allelic states (mutations) • First generate the genealogy (who begat whom) • Assign an allelic state (0) to the ancestor. Drop mutations on the branches. Vineet Bafna

  14. Coalescent model • Insight 1: • Assign an allelic state (0) to the ancestor. Drop mutations on the branches. • The mutations are proportional to the branch length. • Each site (locus) mutates at most once. • At the end, drop any fixed sites • How efficient is this? How many generations do we need to simulate today’s population? 0 0 0 0 0 Loci 0 1 1 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 0 1 Loci Individuals Vineet Bafna

  15. Coalescent theory • Insight 2: • Much of the genealogy is irrelevant, because it disappears. • Better to go backwards Vineet Bafna

  16. Coalescent • Note that in the Wright Fischer model, the population is freely mixing, and constant size N • One way to think about it is that each individual in the current generation selects a parent uniformly at random from the N individuals in the previous generation. • When two individuals choose the same parent, they coalesce. • Once they coalesce, they stay together. We continue until only one individual is left. • Note that this only gives a random topology with labeled leaves. The only thing of interest is branch length (number of generations to MRCA) 1 2 3 4 Vineet Bafna

  17. Coalescent theory (Kingman) • Input • (Fixed population (N individuals), random mating) • Consider 2 individuals. • Probability that they coalesce in the previous generation (have the same parent)= • Probability that they do not coalesce after t generations= Vineet Bafna

  18. Coalescent theory • is time in units of N generations • Consider k individuals. • Probability that no pair coalesces after 1 generation • Probability that no pair coalesces after t generations Vineet Bafna

  19. Coalescent approximation • Insight 3: • Topology is independent of coalescent times • If you have n individuals, generate a random binary topology • Iterate (until one individual) • Pick a pair at random, and coalesce • Insight 4: • To generate coalescent times, there is no need to go back generation by generation. Generate n random variables to get the n coalescence times. Vineet Bafna

  20. Coalescent approximation • At any step, there are 1 <= k <= n individuals • To generate time to coalesce (k to k-1 individuals) • Pick a number from exponential distribution with rate k(k-1)/2 • Mean time to coalescence (in units of N generations) = 2/(k(k-1)) Vineet Bafna

  21. Mean time to coalesce • If there are k individuals, the Probability for a coalescence in one generation is • k(k-1)/2N • Expected time to coalesce = 2N/k(k-1) Vineet Bafna

  22. Typical coalescents • 4 random examples with n=6 (Note that we do not need to specify N. Why?) • Expected time to coalesce to 1 node? Vineet Bafna

  23. Coalescent properties • Expected time for the last step • The last step is half of the total time to coalesce • Studying larger number of individuals does not change numbers tremendously • EX: Number of mutations in a population is proportional to the total branch length of the tree • E(Ttot) =1 Vineet Bafna

  24. Variants (exponentially growing populations) • If the population is growing exponentially, the branch lengths become similar, or even star-like. Why? • With appropriate scaling of time, the same process can be extended to various scenarios: male-female, hermaphrodite, segregation, migration, etc. Vineet Bafna

  25. Simulating population data • Generate a coalescent (Topology + Branch lengths) • For each branch length, drop mutations with rate  • Generate sequence data • Note that the resulting sequence is a perfect phylogeny. • Given such sequence data, can you reconstruct the coalescent tree? (Only the topology, not the branch lengths) • Also, note that all pairs of positions are correlated (should have high LD). Vineet Bafna

  26. Coalescent with Recombination • An individual may have one parent, or 2 parents Vineet Bafna

  27. ARG: Coalescent with recombination • Given: mutation rate , recombination rate , population size 2N (diploid), sample size n. • How can you generate the ARG (topology+branch lengths) efficiently? • How will you generate sequences for n individuals? • Given sequence data, can you reconstruct the ARG (topology) Vineet Bafna

  28. Recombination • Define r as the probability of recombining per generation. • Note that the parameter is a value which will be defined later • Assume k individuals in a generation. The following might happen: • An individual arises because of a recombination event between two individuals (It will have 2 parents). • Two individuals coalesce. • Neither (Each individual has a distinct parent). • Multiple events (low probability). Vineet Bafna

  29. Recombination • We ignore the case of multiple (> 1) events in one generation • Pr (No recombination) = 1-kr • Pr (No coalescence) • Consider scaled time in units of 2N generations. Thus the number of individuals increase with rate kr2N, and decrease with rate • The value 2rN is usually small, and therefore, the process will ultimately coalesce to a single individual (MRCA) Vineet Bafna

  30. ARG What is the flaw in this procedure? • Let k = n, • Define • Iterate until k= 1 • Choose time from an exponential distribution with rate • Pick event as recombination with probability • If event is recombination, choose an individual to recombine, and a position, else choose a pair to coalesce. • Update k, and continue Vineet Bafna

  31. Ancestral Recombination Graph Vineet Bafna

  32. Simulating sequences on the ARG • Generate topology and branch lengths as before • For each recombination, generate a position. • Next generate mutations at random on branch lengths • For a mutation, select a position as well. Vineet Bafna

  33. Review: Coalescent theory applications • Coalescent simulations allow us to test various hypothesis. The coalescent/ARG is usually not inferred, unlike in phylogenies. Vineet Bafna

  34. Coalescent theory: example • Ex: ~1400bp at Sod locus in Dros. • 10 taxa • 5 were identical. The other 5 had 55 mutations. • Q: Is this a chance event, or is there selection for this haplotype. Vineet Bafna

  35. Coalescent application • 10000 coalescent simulations were performed on 10 taxa. • 55 mutations on the coalescent branches • Count the number of times 5 lineages are identical • The event happened in 1.1% of the cases. • Conclusion: selection, or some other mechanism explains this data. Vineet Bafna

  36. Coalescent example: Out of Africa hypothesis • Looking at lineage specific mutations might help discard the candelabra model. How? • How do we decide between the multi-regional and Out-of-Africa model? How do we decide if the ancestor was African? Vineet Bafna

  37. Human Samples • We look at data from human samples • Gabriel et al. Science 2002. • 3 populations were sampled at multiple regions spanning the genome • 54 regions (Average size 250Kb) • SNP density 1 over 2Kb • 90 Individuals from Nigeria (Yoruban) • 93 Europeans • 42 Asian • 50 African American Vineet Bafna

  38. Population specific recombination • D’ was used as the measure between SNP pairs. • SNP pairs were classified in one of the following • Strong LD • Strong evidence for recombination • Others (13% of cases) • This roughly favors out-of-africa. A Coalescent simulation can help give confidence values on this. Gabriel et al., Science 2002 Vineet Bafna

  39. Recombination events and  • Given , n, can you compute the expected number of recombination events? • It can be shown that E(n, ) =  log (n) • Questions that people are interested in • Given a set of sequences from a population, compute the recombination rate  • Given a population reconstruct the most likely history (as an ancestral recombination graph) Vineet Bafna

  40. Re-constructing history without the coalescent Vineet Bafna

  41. An algorithm for constructing a perfect phylogeny • We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later. • In any tree, each node (except the root) has a single parent. • It is sufficient to construct a parent for every node. • In each step, we add a column and refine some of the nodes containing multiple children. • Stop if all columns have been considered. Vineet Bafna

  42. Inclusion Property • For any pair of columns i,j • i < j if and only if i1 j1 • Note that if i<j then the edge containing i is an ancestor of the edge containing i i j Vineet Bafna

  43. Example r A B C D E 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 Initially, there is a single clade r, and each node has r as its parent Vineet Bafna

  44. Sort columns • Sort columns according to the inclusion property (note that the columns are already sorted here). • This can be achieved by considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 Vineet Bafna

  45. Add first column • In adding column i • Check each edge and decide which side you belong. • Finally add a node if you can resolve a clade 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 r u B D A C E Vineet Bafna

  46. Adding other columns • Add other columns on edges using the ordering property 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 r 1 3 E 2 B 5 4 D A C Vineet Bafna

  47. Unrooted case • Switch the values in each column, so that 0 is the majority element. • Apply the algorithm for the rooted case Vineet Bafna

  48. Vineet Bafna

  49. Vineet Bafna

More Related