800 likes | 817 Views
Explore the concept of neutral evolution and its consequences in genetics, including estimating mutation rates, evolutionary reconstruction, and the role of neutral alleles. Gain insights into the molecular view of selection and the structure of DNA.
E N D
Selection Vineet Bafna
Neutral evolution • Early assumption: most variation is mildly deleterious. • Early genotyping surveys quickly revealed that the number of variable regions were far too many to be all deleterious. • Kimura suggested that most alleles are selectively neutral, • The presence of neutrally evolving alleles changes the landscape of genetics. Vineet Bafna
Consequence of neutral theory • An important consequence of the neutral theory is that mutations in a region occur at some fixed rate • So, the number of mutations between two species in a certain region is an estimate of the evolutionary time between them • Basis of phylogenetic reconstruction Vineet Bafna
High level view of evolutionary reconstruction • Take a population sample (say human) limited to neutrally evolving mutations • Use the population sample to estimate the rate of mutation • Compute the number of mutations in an orthologous sample in another species. • Use the rate and mutational distance to estimate time of divergence human human chimp Vineet Bafna
Estimating mutation rates • Given a population sample, can you estimate the mutation rate? • Recall that =4N • : number of mutations per generation in the genomic region considered • As we do not know N, we end up estimating • can be divided by the size of the genomic region being considered to get the mutation rate per bp. Vineet Bafna
Watterson’s estimate • Let S be the number of mutations. Recall that • E(S) = E(Ttot) • E(S) = 2N k 2/(k-1) = 4N ( + ln (n-1)) • Watterson’s estimate • W = Sn/ ( + ln (n-1)) Vineet Bafna
Tajima’s estimate of • Define ij = heterozygosity between two individuals • Note: heterozygosity = # differing sites = hamming distance i:0 1 0 0 0 0 1 1 0 j: 0 0 0 0 0 0 1 1 1 ij = 2 • Average heterozygosity can be empirically computed from a sample as Vineet Bafna
Estimating Average heterozygosity • Assuming an underlying coalescent model of evolution, what is the average heterozygosity? • Q: Given 2 randomly picked individuals, what is the expected time to coalescence? • A: 2N • Q: Given 2 individuals what is the expected number of mutations in the lineages connecting them? • A: 2 2N = • Therefore, the average heterozygosity k is an estimate (Tajima’s estimate) of Vineet Bafna
Basic principles of selection • More offsprings are produced than can survive • Different offsprings have different levels of ‘fitness’ • ‘fit’ individuals are more likely to survive and pass on their genotypes Vineet Bafna
Molecular view of selection • Mutations arise at random in a population. • If a mutation is deleterious, it is quickly eliminated. • If a mutation is advantageous, it is quickly driven to fixation • If it is neutral (doesn’t change fitness), it stays at intermediate frequencies in the population until it eventually is fixed, or eliminated by random genetic drift. Vineet Bafna
Neutral alleles • Using data of neutral alleles, we can make evolutionary inferences • While most alleles are selectively neutral, not all alleles are such. • How can we decide if an allele is neutral? To answer this, we need to learn a bit of biology Vineet Bafna
Life begins with Cell • A cell is a smallest structural unit of an organism that is capable of independent functioning • All cells have some common features • They have various compartments, and molecules that act within these compartments Vineet Bafna
All life depends on 3 critical molecules • Protein • Form enzymes, send signals to other cells, regulate gene activity. • Form body’s major components (e.g. hair, skin, etc.). • DNA • Hold information on how cell works • RNA • Act to transfer short pieces of information to different parts of cell • Provide templates to synthesize into protein Vineet Bafna
The molecules of Life and Bioinformatics • DNA/RNA are long chains of nucleotides (4 types) • Proteins are also long chains of amino-acids (20 types) • DNA, RNA, and Proteins can all be represented as strings! • DNA/RNA are string over a 4 letter alphabet(A,C,G,T/U). • Protein Sequences are strings over a 20 letter alphabet. • This allows us to store and query them as text. Vineet Bafna
DNA • DNA is the only inherited molecule. It must have all the ‘instructions’ for making all proteins. • When cells divide and differentiate to form tissues, different proteins must be active in different cells. DNA must contain the instructions for activating/deactivating the production of these proteins. • DNA is packaged into a genome • Specific regions on the genome have the code/instruction for a specific (set of) protein(s). • What do we call these regions? Vineet Bafna
DNA structure • Watson and Crick identified the structure of DNA in 1959. • Established DNA as a double stranded molecule with a helical structure (double helix) • Complementary base-pairs form hydrogen bonds that stabilize the molecule Vineet Bafna
Transcription • DNA is a double stranded molecule • During transcription, the two strands separate, and a copy is made of the gene • The copied form is RNA (T is changed to U) http://fig.cox.miami.edu/~cmallery/150/gene/c7.17.7b.transcription.jpg http://www.geneticengineering.org/chemis/Chemis-NucleicAcid/Graphics/Transcription.gif Vineet Bafna
Transcription and translation • The transcribed messenger RNA leaves the nucleus and goes to the cytoplasm. • The ribosomal machinery reads the transcript and produces a protein • There is a unique mapping from nucleotide triplets to amino-acids Vineet Bafna
Translation • The ribosomal machinery reads mRNA. • Each triplet is translated into a unique amino-acid until the STOP codon is encountered. • There is also a special signal where translation starts, usually at the ATG (M) codon. Vineet Bafna
The genetic code • Each triplet is translated into a unique amino-acid until the STOP codon is encountered. • There is also a special signal where translation starts, usually at the ATG (M) codon. • Given a DNA sequence, how many ways can you translate it? Vineet Bafna
Project sign up • Please sign up for the projects • First presentation will be Feb 5, 7. Vineet Bafna
Neutral alleles • Now that we know some molecular biology,.. • How can we detect neutral alleles? • 4 fold degenerate sites in DNA should be selectively neutral Vineet Bafna
Tests for neutrality • Neutral alleles can be identified and are very useful in computing genetic/evolutionary parameters (mutation rate, recombination rate…) • However, not all mutations are selectively neutral • Also, such mutations might switch from being neutral to advantageous/deleterious Vineet Bafna
Recent adaptive selection • Many adults are lactose intolerant • Consumption of milk-products leads to indigestion/sickness Vineet Bafna
Hypothesis • Possible that lactose intolerance was not a disease. • In warm climates, milk-products were not consumed • In colder climates (lack of food), ability to digest milk products conferred a selective advantage • Finnish people are less likely to be lactose intolerant than Asian people. • The mutation conferring tolerance is likely under selection for Finnish people. • How can we detect such non-neutrally evolving regions in a population sample? Vineet Bafna
Two estimates of mutation rate • Let S be the number of mutations. Recall that • E(S) = E(Ttot) • E(S) = 2N k 2/(k-1) = 4N ( + ln (n-1)) • Watterson’s estimate • W = Sn/ ( + ln (n-1)) • Tajima’s estimate. Let ij be the heterozygosity between individuals i and j. The average heterozygosity is an estimate of Vineet Bafna
Tajima’s D statistic • Tajima proposed the difference of the two as a test of selection (Tajima’s D statistic) • Tajima’s D =~ k- W • The actual statistic involves a normalization • Under neutral evolution, D=0 • What do we expect under positive selection? Vineet Bafna
Tajima’s D under selection? • Under positive selection, there is a loss in average heterozygosity? • D =~ k- W < 0 • Under balancing selection, there should be a gain in average heterozygosity? • D > 0 Vineet Bafna
When does Tajima’s D fail? • When the population is growing, what will happen to average heterozygosity? • What happens when the selection event is a recent one? Vineet Bafna
Test of recent selection Vineet Bafna
Malarial resistance • Two genes have been implicated in resistance to the malarial parasite Plasmodium falciparum • Glucose-6-phosphate dehydrogenase (G6PD) • A common variant G6PD-202A confers partial protection against malaria • Likewise, TNFSF5-726C is a variant associated with protection against malaria. • Sabeti et al. describe a test for identifying regions under selection, and test them on these loci Vineet Bafna
The EHH test • G6PD • A core region of 15kb was identified, and 11 SNPs genotyped • The core region was dense and had high LD (genealogy could be identified) Vineet Bafna
Extending core haplotypes • As you add distant SNPs, the haplotypes begin to decay (reduce in frequency). • For each core haplotype, do the EHH test • Define EHH (d): probability that two randomly chosen chromosomes with the core-haplotype are identical at distance d • Cleary EHH will decay due to mutations and recombinations • Claim: if the core haplotype is under selection, it will decay less than other haplotypes. CGCGGACCGCC CGCGGACCGCC CGCGGACCGCC CGCGGACCGCC CGCGGACCGCC CGCGGACCGCC Vineet Bafna
Decay in EHH • High values of EHH indicates selection • Note that EHH decays both due to mutation as well as recombination. • Mutation rates are different in different regions. • How do we choose cut-offs for EHH statistic? Vineet Bafna
EHH test at • Decay of the 9 core haplotypes of G6PD region. • Only one core haplotype (CH8) shows selection • The other haplotypes serve as control Vineet Bafna
Relative EHH • Define: relative EHH: • EHH of core-haplotype/(aversge EHH of all other haplotypes) • Plot shows relative EHH for the 9 core ahplotypes and simulated data Vineet Bafna
EHH The EHH test helps in identifying recent positive selection. Sabeti’s paper claims that for this data set, the other statistics don’t work as well. Can this be tested? Can you suggest where the test might fail to detect recent positive selection? Vineet Bafna
What would happen for balancing selection? Could it be that if you have 2 strong haplotypes, that they would decrease each other’s relative EHH? What about the impact of a single ‘misplaced’ mutation? If a single misplaced mutation can have such an effect, what would happen with a few of them? As only a handful of examples are known for positive selection, false negatives are harder to quantify? Vineet Bafna
A proposal Consider a case with recent positive selection, or even balancing selection? The presence of new mutations on 6, and 7,8 reduce the EHH frequency. However, the true signal is still there in the presence of a long branch with 5 mutations Vineet Bafna
A combinatorial formulation In other words, is there a ‘large’ subset of individuals, and a ‘large’ subset of sites that are identical? What is the formulation for ‘balanced selection’? Vineet Bafna
Combinatorial formulation Given n individuals, m sites, determine if there exist at least n1 individuals, and m1 < m sites such that the n1 individuals are identical when restricted to the m1 sites Is the problem is NP-hard? Yes, for many natural variants. Vineet Bafna
Linear programming Many combinatorial optimization problems can be formulated naturally as (integer) linear programming Today we will explore this algorithmic paradigm (and related ones) using selection as an example; Caveat: if the original formulation is biologically wrong, any results will be meaningless If the original formulation is technically wrong, it may change the complexity of the problem. Finally, for a specific formulation, there may be multiple approaches to solving it. Vineet Bafna
Generic Linear Programming Vineet Bafna
A linear objective function • Note that the objective is to maximize cTx • Consider the equation cTx=c0 • It defines a hyperplane Vineet Bafna
Geometry of dot-product • Dot product? • What is ||x||2? • What is x/||x||2? • What is xTy? x=(x1,x2) y Vineet Bafna
Dot Product • Let c be a unit vector. • ||c|| = 1 • Recall that • cTx = ||x|| cos • What is cTx if x is orthogonal (perpendicular) to c? x c cTx = ||x|| cos Vineet Bafna
Hyperplane • Find the unit vector that is perpendicular (normal to the hyperplane) • How can we define a hyperplane L? Vineet Bafna
Points on the hyperplane c x2 x1 • Consider a hyperplane L defined by unit vector c, and distance c0 • Notes; • For all x L, xTcmust be the same, xTc = c0 • For any two points x1, x2, • (x1- x2)T c=0 Vineet Bafna
Back to LP? cTx=c0 • Remember the goal is to maximize cTx • Geometrically, this is equivalent to moving the hyperplane along the orthogonal axis. c cTx=c1 Vineet Bafna