1 / 80

Selection

Explore the concept of neutral evolution and its consequences in genetics, including estimating mutation rates, evolutionary reconstruction, and the role of neutral alleles. Gain insights into the molecular view of selection and the structure of DNA.

bmunford
Download Presentation

Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Selection Vineet Bafna

  2. Neutral evolution • Early assumption: most variation is mildly deleterious. • Early genotyping surveys quickly revealed that the number of variable regions were far too many to be all deleterious. • Kimura suggested that most alleles are selectively neutral, • The presence of neutrally evolving alleles changes the landscape of genetics. Vineet Bafna

  3. Consequence of neutral theory • An important consequence of the neutral theory is that mutations in a region occur at some fixed rate • So, the number of mutations between two species in a certain region is an estimate of the evolutionary time between them • Basis of phylogenetic reconstruction Vineet Bafna

  4. High level view of evolutionary reconstruction • Take a population sample (say human) limited to neutrally evolving mutations • Use the population sample to estimate the rate of mutation  • Compute the number of mutations in an orthologous sample in another species. • Use the rate and mutational distance to estimate time of divergence human  human chimp Vineet Bafna

  5. Estimating mutation rates • Given a population sample, can you estimate the mutation rate? • Recall that =4N •  : number of mutations per generation in the genomic region considered • As we do not know N, we end up estimating  •  can be divided by the size of the genomic region being considered to get the mutation rate per bp. Vineet Bafna

  6. Watterson’s estimate • Let S be the number of mutations. Recall that • E(S) = E(Ttot) • E(S) =  2N k 2/(k-1) = 4N  ( + ln (n-1)) • Watterson’s estimate • W = Sn/ ( + ln (n-1)) Vineet Bafna

  7. Tajima’s estimate of  • Define ij = heterozygosity between two individuals • Note: heterozygosity = # differing sites = hamming distance i:0 1 0 0 0 0 1 1 0 j: 0 0 0 0 0 0 1 1 1 ij = 2 • Average heterozygosity can be empirically computed from a sample as Vineet Bafna

  8. Estimating Average heterozygosity • Assuming an underlying coalescent model of evolution, what is the average heterozygosity? • Q: Given 2 randomly picked individuals, what is the expected time to coalescence? • A: 2N • Q: Given 2 individuals what is the expected number of mutations in the lineages connecting them? • A:  2 2N =  • Therefore, the average heterozygosity k is an estimate (Tajima’s estimate) of  Vineet Bafna

  9. Basic principles of selection • More offsprings are produced than can survive • Different offsprings have different levels of ‘fitness’ • ‘fit’ individuals are more likely to survive and pass on their genotypes Vineet Bafna

  10. Molecular view of selection • Mutations arise at random in a population. • If a mutation is deleterious, it is quickly eliminated. • If a mutation is advantageous, it is quickly driven to fixation • If it is neutral (doesn’t change fitness), it stays at intermediate frequencies in the population until it eventually is fixed, or eliminated by random genetic drift. Vineet Bafna

  11. Neutral alleles • Using data of neutral alleles, we can make evolutionary inferences • While most alleles are selectively neutral, not all alleles are such. • How can we decide if an allele is neutral? To answer this, we need to learn a bit of biology Vineet Bafna

  12. Life begins with Cell • A cell is a smallest structural unit of an organism that is capable of independent functioning • All cells have some common features • They have various compartments, and molecules that act within these compartments Vineet Bafna

  13. All life depends on 3 critical molecules • Protein • Form enzymes, send signals to other cells, regulate gene activity. • Form body’s major components (e.g. hair, skin, etc.). • DNA • Hold information on how cell works • RNA • Act to transfer short pieces of information to different parts of cell • Provide templates to synthesize into protein Vineet Bafna

  14. The molecules of Life and Bioinformatics • DNA/RNA are long chains of nucleotides (4 types) • Proteins are also long chains of amino-acids (20 types) • DNA, RNA, and Proteins can all be represented as strings! • DNA/RNA are string over a 4 letter alphabet(A,C,G,T/U). • Protein Sequences are strings over a 20 letter alphabet. • This allows us to store and query them as text. Vineet Bafna

  15. DNA • DNA is the only inherited molecule. It must have all the ‘instructions’ for making all proteins. • When cells divide and differentiate to form tissues, different proteins must be active in different cells. DNA must contain the instructions for activating/deactivating the production of these proteins. • DNA is packaged into a genome • Specific regions on the genome have the code/instruction for a specific (set of) protein(s). • What do we call these regions? Vineet Bafna

  16. Vineet Bafna

  17. DNA structure • Watson and Crick identified the structure of DNA in 1959. • Established DNA as a double stranded molecule with a helical structure (double helix) • Complementary base-pairs form hydrogen bonds that stabilize the molecule Vineet Bafna

  18. Transcription • DNA is a double stranded molecule • During transcription, the two strands separate, and a copy is made of the gene • The copied form is RNA (T is changed to U) http://fig.cox.miami.edu/~cmallery/150/gene/c7.17.7b.transcription.jpg http://www.geneticengineering.org/chemis/Chemis-NucleicAcid/Graphics/Transcription.gif Vineet Bafna

  19. Transcription and translation • The transcribed messenger RNA leaves the nucleus and goes to the cytoplasm. • The ribosomal machinery reads the transcript and produces a protein • There is a unique mapping from nucleotide triplets to amino-acids Vineet Bafna

  20. Translation • The ribosomal machinery reads mRNA. • Each triplet is translated into a unique amino-acid until the STOP codon is encountered. • There is also a special signal where translation starts, usually at the ATG (M) codon. Vineet Bafna

  21. The genetic code • Each triplet is translated into a unique amino-acid until the STOP codon is encountered. • There is also a special signal where translation starts, usually at the ATG (M) codon. • Given a DNA sequence, how many ways can you translate it? Vineet Bafna

  22. Project sign up • Please sign up for the projects • First presentation will be Feb 5, 7. Vineet Bafna

  23. Neutral alleles • Now that we know some molecular biology,.. • How can we detect neutral alleles? • 4 fold degenerate sites in DNA should be selectively neutral Vineet Bafna

  24. Tests for neutrality • Neutral alleles can be identified and are very useful in computing genetic/evolutionary parameters (mutation rate, recombination rate…) • However, not all mutations are selectively neutral • Also, such mutations might switch from being neutral to advantageous/deleterious Vineet Bafna

  25. Recent adaptive selection • Many adults are lactose intolerant • Consumption of milk-products leads to indigestion/sickness Vineet Bafna

  26. Hypothesis • Possible that lactose intolerance was not a disease. • In warm climates, milk-products were not consumed • In colder climates (lack of food), ability to digest milk products conferred a selective advantage • Finnish people are less likely to be lactose intolerant than Asian people. • The mutation conferring tolerance is likely under selection for Finnish people. • How can we detect such non-neutrally evolving regions in a population sample? Vineet Bafna

  27. Two estimates of mutation rate • Let S be the number of mutations. Recall that • E(S) = E(Ttot) • E(S) =  2N k 2/(k-1) = 4N  ( + ln (n-1)) • Watterson’s estimate • W = Sn/ ( + ln (n-1)) • Tajima’s estimate. Let ij be the heterozygosity between individuals i and j. The average heterozygosity is an estimate of  Vineet Bafna

  28. Tajima’s D statistic • Tajima proposed the difference of the two as a test of selection (Tajima’s D statistic) • Tajima’s D =~ k- W • The actual statistic involves a normalization • Under neutral evolution, D=0 • What do we expect under positive selection? Vineet Bafna

  29. Tajima’s D under selection? • Under positive selection, there is a loss in average heterozygosity? • D =~ k- W < 0 • Under balancing selection, there should be a gain in average heterozygosity? • D > 0 Vineet Bafna

  30. When does Tajima’s D fail? • When the population is growing, what will happen to average heterozygosity? • What happens when the selection event is a recent one? Vineet Bafna

  31. Test of recent selection Vineet Bafna

  32. Malarial resistance • Two genes have been implicated in resistance to the malarial parasite Plasmodium falciparum • Glucose-6-phosphate dehydrogenase (G6PD) • A common variant G6PD-202A confers partial protection against malaria • Likewise, TNFSF5-726C is a variant associated with protection against malaria. • Sabeti et al. describe a test for identifying regions under selection, and test them on these loci Vineet Bafna

  33. The EHH test • G6PD • A core region of 15kb was identified, and 11 SNPs genotyped • The core region was dense and had high LD (genealogy could be identified) Vineet Bafna

  34. Extending core haplotypes • As you add distant SNPs, the haplotypes begin to decay (reduce in frequency). • For each core haplotype, do the EHH test • Define EHH (d): probability that two randomly chosen chromosomes with the core-haplotype are identical at distance d • Cleary EHH will decay due to mutations and recombinations • Claim: if the core haplotype is under selection, it will decay less than other haplotypes. CGCGGACCGCC CGCGGACCGCC CGCGGACCGCC CGCGGACCGCC CGCGGACCGCC CGCGGACCGCC Vineet Bafna

  35. Decay in EHH • High values of EHH indicates selection • Note that EHH decays both due to mutation as well as recombination. • Mutation rates are different in different regions. • How do we choose cut-offs for EHH statistic? Vineet Bafna

  36. EHH test at • Decay of the 9 core haplotypes of G6PD region. • Only one core haplotype (CH8) shows selection • The other haplotypes serve as control Vineet Bafna

  37. Relative EHH • Define: relative EHH: • EHH of core-haplotype/(aversge EHH of all other haplotypes) • Plot shows relative EHH for the 9 core ahplotypes and simulated data Vineet Bafna

  38. EHH The EHH test helps in identifying recent positive selection. Sabeti’s paper claims that for this data set, the other statistics don’t work as well. Can this be tested? Can you suggest where the test might fail to detect recent positive selection? Vineet Bafna

  39. What would happen for balancing selection? Could it be that if you have 2 strong haplotypes, that they would decrease each other’s relative EHH? What about the impact of a single ‘misplaced’ mutation? If a single misplaced mutation can have such an effect, what would happen with a few of them? As only a handful of examples are known for positive selection, false negatives are harder to quantify? Vineet Bafna

  40. A proposal Consider a case with recent positive selection, or even balancing selection? The presence of new mutations on 6, and 7,8 reduce the EHH frequency. However, the true signal is still there in the presence of a long branch with 5 mutations Vineet Bafna

  41. A combinatorial formulation In other words, is there a ‘large’ subset of individuals, and a ‘large’ subset of sites that are identical? What is the formulation for ‘balanced selection’? Vineet Bafna

  42. Combinatorial formulation Given n individuals, m sites, determine if there exist at least n1 individuals, and m1 < m sites such that the n1 individuals are identical when restricted to the m1 sites Is the problem is NP-hard? Yes, for many natural variants. Vineet Bafna

  43. Linear programming Many combinatorial optimization problems can be formulated naturally as (integer) linear programming Today we will explore this algorithmic paradigm (and related ones) using selection as an example; Caveat: if the original formulation is biologically wrong, any results will be meaningless If the original formulation is technically wrong, it may change the complexity of the problem. Finally, for a specific formulation, there may be multiple approaches to solving it. Vineet Bafna

  44. Generic Linear Programming Vineet Bafna

  45. A linear objective function • Note that the objective is to maximize cTx • Consider the equation cTx=c0 • It defines a hyperplane Vineet Bafna

  46. Geometry of dot-product • Dot product? • What is ||x||2? • What is x/||x||2? • What is xTy? x=(x1,x2) y Vineet Bafna

  47. Dot Product • Let c be a unit vector. • ||c|| = 1 • Recall that • cTx = ||x|| cos  • What is cTx if x is orthogonal (perpendicular) to c? x  c cTx = ||x|| cos  Vineet Bafna

  48. Hyperplane • Find the unit vector that is perpendicular (normal to the hyperplane) • How can we define a hyperplane L? Vineet Bafna

  49. Points on the hyperplane c x2 x1 • Consider a hyperplane L defined by unit vector c, and distance c0 • Notes; • For all x  L, xTcmust be the same, xTc = c0 • For any two points x1, x2, • (x1- x2)T c=0 Vineet Bafna

  50. Back to LP? cTx=c0 • Remember the goal is to maximize cTx • Geometrically, this is equivalent to moving the hyperplane along the orthogonal axis. c cTx=c1 Vineet Bafna

More Related