1 / 65

CSE280b: Population Genetics

CSE280b: Population Genetics. Vineet Bafna/Pavel Pevzner. www.cse.ucsd.edu/classes/sp05/cse291. Population Genetics. Individuals in a species (population) are phenotypically different. Often these differences are inherited (genetic). Studying these differences is important!

jbowles
Download Presentation

CSE280b: Population Genetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner www.cse.ucsd.edu/classes/sp05/cse291 Vineet Bafna

  2. Population Genetics • Individuals in a species (population) are phenotypically different. • Often these differences are inherited (genetic). • Studying these differences is important! • Q:How predictive are these differences? Vineet Bafna

  3. EX:Population Structure Oceania Eurasia East Asia America Africa • 377 locations (loci) were sampled in 1000 people from 52 populations. • 6 genetic clusters were obtained, which corresponded to 5 geographic regions (Rosenberg et al. Science 2003) • Genetic differences can predict ethnicity. Vineet Bafna

  4. Scope of these lectures • Basic terminology • Key principles • Sources of variation • HW equilibrium • Linkage • Coalescent theory • Recombination/Ancestral Recombination Graph • Haplotypes/Haplotype phasing • Population sub-structure • Structural polymorphisms • Medical genetics basis: Association mapping/pedigree analysis Vineet Bafna

  5. Alleles • Genotype: genetic makeup of an individual • Allele: A specific variant at a location • The notion of alleles predates the concept of gene, and DNA. • Initially, alleles referred to variants that described a measurable phenotype (round/wrinkled seed) • Now, an allele might be a nucleotide on a chromosome, with no measurable phenotype. • Humans are diploid, they have 2 copies of each chromosome. • They may have heterozygosity/homozygosity at a location • Other organisms (plants) have higher forms of ploidy. • Additionally, some sites might have 2 allelic forms, or even many allelic forms. Vineet Bafna

  6. What causes variation in a population? • Mutations (may lead to SNPs) • Recombinations • Other genetic events (gene conversion) • Structural Polymorphisms Vineet Bafna

  7. Single Nucleotide Polymorphisms Infinite Sites Assumption: Each site mutates at most once 00000101011 10001101001 01000101010 01000000011 00011110000 00101100110 Vineet Bafna

  8. Short Tandem Repeats GCTAGATCATCATCATCATTGCTAG GCTAGATCATCATCATTGCTAGTTA GCTAGATCATCATCATCATCATTGC GCTAGATCATCATCATTGCTAGTTA GCTAGATCATCATCATTGCTAGTTA GCTAGATCATCATCATCATCATTGC 4 3 5 3 3 5 Vineet Bafna

  9. STR can be used as a DNA fingerprint • Consider a collection of regions with variable length repeats. • Variable length repeats will lead to variable length DNA • Vector of lengths is a finger-print 4 2 3 3 5 1 3 2 3 1 5 3 individuals loci Vineet Bafna

  10. Recombination 00000000 11111111 00011111 Vineet Bafna

  11. Gene Conversion • Gene Conversion versus crossover • Hard to distinguish in a population Vineet Bafna

  12. Structural polymorphisms • Large scale structural changes (deletions/insertions/inversions) may occur in a population. Vineet Bafna

  13. Topic 1: Basic Principles • In a ‘stable’ population, the distribution of alleles obeys certain laws • Not really, and the deviations are interesting • HW Equilibrium • (due to mixing in a population) • Linkage (dis)-equilibrium • Due to recombination Vineet Bafna

  14. Hardy Weinberg equilibrium • Consider a locus with 2 alleles, A, a • p(respectively, q) is the frequency of A (resp. a) in the population • 3 Genotypes: AA, Aa, aa • Q: What is the frequency of each genotype • If various assumptions are satisfied, (such as • random mating, no natural selection), Then • PAA=p2 • PAa=2pq • Paa=q2 Vineet Bafna

  15. Hardy Weinberg: why? • Assumptions: • Diploid • Sexual reproduction • Random mating • Bi-allelic sites • Large population size, … • Why? Each individual randomly picks his two chromosomes. Therefore, Prob. (Aa) = pq+qp = 2pq, and so on. Vineet Bafna

  16. Hardy Weinberg: Generalizations • Multiple alleles with frequencies • By HW, • Multiple loci? Vineet Bafna

  17. Hardy Weinberg: Implications • The allele frequency does not change from generation to generation. Why? • It is observed that 1 in 10,000 caucasians have the disease phenylketonuria. The disease mutation(s) are all recessive. What fraction of the population carries the disease? • Males are 100 times more likely to have the “red’ type of color blindness than females. Why? • Conclusion: While the HW assumptions are rarely satisfied, the principle is still important as a baseline assumption, and significant deviations are interesting. Vineet Bafna

  18. Recombination 00000000 11111111 00011111 Vineet Bafna

  19. What if there were no recombinations? • Life would be simpler • Each individual sequence would have a single parent (even for higher ploidy) • The relationship is expressed as a tree. Vineet Bafna

  20. The Infinite Sites Assumption 0 0 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 5 8 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 • The different sites are linked. A 1 in position 8 implies 0 in position 5, and vice versa. • Some phenotypes could be linked to the polymorphisms • Some of the linkage is “destroyed” by recombination Vineet Bafna

  21. Infinite sites assumption and Perfect Phylogeny • Each site is mutated at most once in the history. • All descendants must carry the mutated value, and all others must carry the ancestral value i 1 in position i 0 in position i Vineet Bafna

  22. Perfect Phylogeny • Assume an evolutionary model in which no recombination takes place, only mutation. • The evolutionary history is explained by a tree in which every mutation is on an edge of the tree. All the species in one sub-tree contain a 0, and all species in the other contain a 1. Such a tree is called a perfect phylogeny. Vineet Bafna

  23. The 4-gamete condition • A column i partitions the set of species into two sets i0, and i1 • A column is homogeneous w.r.t a set of species, if it has the same value for all species. Otherwise, it is heterogenous. • EX: i is heterogenous w.r.t {A,D,E} i A 0 B 0 C 0 D 1 E 1 F 1 i0 i1 Vineet Bafna

  24. 4 Gamete Condition • 4 Gamete Condition • There exists a perfect phylogeny if and only if for all pair of columns (i,j), j is not heterogenous w.r.t i0, or i1. • Equivalent to • There exists a perfect phylogeny if and only if for all pairs of columns (i,j), the following 4 rows do not exist (0,0), (0,1), (1,0), (1,1) Vineet Bafna

  25. 4-gamete condition: proof (only if) • Depending on which edge the mutation j occurs, either i0, or i1 should be homogenous. • (only if) Every perfect phylogeny satisfies the 4-gamete condition • (if) If the 4-gamete condition is satisfied, does a prefect phylogeny exist? i j i0 i1 Vineet Bafna

  26. Handling recombination • A tree is not sufficient as a sequence may have 2 parents • Recombination leads to loss of correlation between columns Vineet Bafna

  27. Linkage (Dis)-equilibrium (LD) • Consider sites A &B • Case 1: No recombination • Each new individual chromosome chooses a parent from the existing ‘haplotype’ A B 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 Vineet Bafna

  28. Linkage (Dis)-equilibrium (LD) • Consider sites A &B • Case 2: diploidy and recombination • Each new individual chooses a parent from the existing alleles A B 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 1 Vineet Bafna

  29. Linkage (Dis)-equilibrium (LD) • Consider sites A &B • Case 1: No recombination • Each new individual chooses a parent from the existing ‘haplotype’ • Pr[A,B=0,1] = 0.25 • Linkage disequilibrium • Case 2: Extensive recombination • Each new individual simply chooses and allele from either site • Pr[A,B=(0,1)=0.125 • Linkage equilibrium A B 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 Vineet Bafna

  30. LD • In the absence of recombination, • Correlation between columns • The joint probability Pr[A=a,B=b] is different from P(a)P(b) • With extensive recombination • Pr(a,b)=P(a)P(b) Vineet Bafna

  31. Measures of LD • Consider two bi-allelic sites with alleles marked with 0 and 1 • Define • P00 = Pr[Allele 0 in locus 1, and 0 in locus 2] • P0* = Pr[Allele 0 in locus 1] • Linkage equilibrium if P00 = P0* P*0 • D = abs(P00 - P0* P*0) = abs(P01 - P0* P*1) = … Vineet Bafna

  32. LD over time • With random mating, and fixed recombination rate r between the sites, Linkage Disequilibrium will disappear • Let D(t) = LD at time t • P(t)00 = (1-r) P(t-1)00 + r P(t-1)0* P(t-1)*0 • D(t) =P(t)00 - P(t)0* P(t)*0 = P(t)00 - P(t-1)0* P(t-1)*0 (HW) • D(t) =(1-r) D(t-1) =(1-r)t D(0) Vineet Bafna

  33. LD over distance • Assumption • Recombination rate increases linearly with distance • LD decays exponentially with distance. • The assumption is reasonable, but recombination rates vary from region to region, adding to complexity • This simple fact is the basis of disease association mapping. Vineet Bafna

  34. LD and disease mapping • Consider a mutation that is causal for a disease. • The goal of disease gene mapping is to discover which gene (locus) carries the mutation. • Consider every polymorphism, and check: • There might be too many polymorphisms • Multiple mutations (even at a single locus) that lead to the same disease • Instead, consider a dense sample of polymorphisms that span the genome Vineet Bafna

  35. LD can be used to map disease genes • LD decays with distance from the disease allele. • By plotting LD, one can short list the region containing the disease gene. LD D N N D D N 0 1 1 0 0 1 Vineet Bafna

  36. LD and disease gene mapping problems • Marker density? • Complex diseases • Population sub-structure Vineet Bafna

  37. Population Genetics • Often we look at these equilibria (Linkage/HW) and their deviations in specific populations • These deviations offer insight into evolution. • However, what is Normal? • A combination of empirical (simulation) and theoretical insight helps distinguish between expected and unexpected. Vineet Bafna

  38. Topic 2: Simulating population data • We described various population genetic concepts (HW, LD), and their applicability • The values of these parameters depend critically upon the population assumptions. • What if we do not have infinite populations • No random mating (Ex: geographic isolation) • Sudden growth • Bottlenecks • Ad-mixture • It would be nice to have a simulation of such a population to test various ideas. How would you do this simulation? Vineet Bafna

  39. Wright Fisher Model of Evolution • Fixed population size from generation to generation • Random mating Vineet Bafna

  40. Coalescent model • Insight 1: • Separate the genealogy from allelic states (mutations) • First generate the genealogy (who begat whom) • Assign an allelic state (0) to the ancestor. Drop mutations on the branches. Vineet Bafna

  41. Coalescent theory • Insight 2: • Much of the genealogy is irrelevant, because it disappears. • Better to go backwards Vineet Bafna

  42. Coalescent theory (Kingman) • Input • (Fixed population (N individuals), random mating) • Consider 2 individuals. • Probability that they coalesce in the previous generation (have the same parent)= • Probability that they do not coalesce after t generations= Vineet Bafna

  43. Coalescent theory • is time in units of N generations • Consider k individuals. • Probability that no pair coalesces after 1 generation • Probability that no pair coalesces after t generations Vineet Bafna

  44. Coalescent approximation • Insight 3: • Topology is independent of coalescent times • If you have n individuals, generate a random binary topology • Iterate (until one individual) • Pick a pair at random, and coalesce • Insight 4: • To generate coalescent times, there is no need to go back generation by generation Vineet Bafna

  45. Coalescent approximation • At any step, there are 1 <= k <= n individuals • To generate time to coalesce (k to k-1 individuals) • Pick a number from exponential distribution with rate k(k-1)/2 • Mean time to coalescence = 2/(k(k-1)) Vineet Bafna

  46. Typical coalescents • 4 random examples with n=6 (Note that we do not need to specify N. Why?) • Expected time to coalesce? Vineet Bafna

  47. Coalescent properties • Expected time for the last step • The last step is half of the total time to coalesce • Studying larger number of individuals does not change numbers tremendously • EX: Number of mutations in a population is proportional to the total branch length of the tree • E(Ttot) =1 Vineet Bafna

  48. Variants (exponentially growing populations) • If the population is growing exponentially, the branch lengths become similar, or even star-like. Why? • With appropriate scaling of time, the same process can be extended to various scenarios: male-female, hermaphrodite, segregation, migration, etc. Vineet Bafna

  49. Simulating population data • Generate a coalescent (Topology + Branch lengths) • For each branch length, drop mutations with rate  • Generate sequence data • Note that the resulting sequence is a perfect phylogeny. • Given such sequence data, can you reconstruct the coalescent tree? (Only the topology, not the branch lengths) • Also, note that all pairs of positions are correlated (should have high LD). Vineet Bafna

  50. Coalescent with Recombination • An individual may have one parent, or 2 parents Vineet Bafna

More Related