1 / 41

Lecture 10: Linkage Analysis III

Lecture 10: Linkage Analysis III. Date: 9/26/02 Revisit segregation ratio distortion. Haplotype coding Three point analysis Multipoint analysis. Additive Segregation Ratio Distortion. Systematic genotype classification error occurs.

amato
Download Presentation

Lecture 10: Linkage Analysis III

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 10: Linkage Analysis III Date: 9/26/02 Revisit segregation ratio distortion. Haplotype coding Three point analysis Multipoint analysis

  2. Additive Segregation Ratio Distortion • Systematic genotype classification error occurs. • Power and estimates of recombination fraction are unaffected by additive distortion in the backcross configuration. • Estimates of recombination fraction are not affected for F2, but the false positive rate increases.

  3. Additive Segregation - Backcross • Suppose the frequency of genotype Aa is increased because a fraction u of aa genotypes are misclassified. • Similarly, assume the frequency of genotype Bb is independently increased by fraction v. • We need to recalculate the expected frequencies under the new model with additional parameters u and v.

  4. Additive Segregation – Backcross (contd)

  5. Additive Segregation – Backcross (contd) • The number of unknown parameters equals the number of degrees of freedom. • Use Bailey’s method to find the MLEs of the parameters (q,u, v).

  6. Bailey’s Method • Set the expected frequencies equal to the observed proportions and solve the system of equations for the unknown parameters. These are the MLEs. • Example: Suppose you observe 5 successes from a Binomial(10, p) distribution. Then pmle = 5/10

  7. Additive Segregation – Backcross (contd) • What do you notice about the MLE for recombinant fraction? • Is the MLE for recombinant fraction biased?

  8. Additive Segregation – F2-CC

  9. Penetrance Distortion - Backcross • Selection, penetrance, linkage to selected markers all can result in penetrance distortion, thus it is quite common. • Suppose (100xu)% of the genotype aa is misclassified as Aa. Similarly, assume that bb has (100xv)% misclassified as Bb independently.

  10. Penetrance Distortion - Backcross

  11. Penetrance Distortion - Backcross • Is the estimate for recombination fraction biased? • The power to detect linkage is decreased.

  12. Cost of Assuming Non-Distortion Model • The estimate for recombination fraction is biased. By how much?

  13. Overall Impact of Segregation Distortion

  14. First Project • This slide marks the end of the material that will be needed to complete the first project.

  15. Linkage Analysis for Multiple Loci • The haplotype is the sequence of alleles along one of the chromosomes in an individual. • In multipoint linkage analysis we are not concerned with the alleles at each locus, rather its parental origin.

  16. Recoding Haplotypes • Suppose there are k loci. Recode each haplotype as a string of k-1 of 0’s and 1’s • If the ith position is 0, it indicates the (i+1)th locus is noit recombinant with respect to the ith locus. • If the ith position is 1, it indicates the (i+1)th locus is recombinant with respect o the ith locus.

  17. Recoding Haplotypes (contd)

  18. Recoding Haplotypes (contd)

  19. Recoded Haplotypes and Recombination Fractions

  20. Sample Problem • Calculate the probabilities of the four haplotype classes (i.e. g00, g10, g01, g11) when qAB = 0.1 and qBC = 0.2 and qAC is unknown. Assume the Sturt map function with L = 1.

  21. Plan of Attack • Transform recombination fractions to genetic map units using the inverse map function. • Sum the genetic map units to obtain length of AC interval. • Calculate the recombination fraction between AC using the map function. • Solve the set of simultaneous equations for the haplotype frequencies.

  22. Step 1

  23. Step 2

  24. Step 3

  25. Step 4

  26. Phase Known Three Point Analysis • When all gametes in sample are fully informative, then the likelihood is simple. How would you test for interference?

  27. Multipoint Analysis – A Difficulty • Suppose there are k loci. • How many haplotypes are possible? • How many recombination fractions are there?

  28. Recombination Value • Definition: The recombination value of a set of intervals is the probability of an odd number of crossovers occurring in the intervals. • How many sets of intervals are there?

  29. Sample Problem – Four Point Analysis • Suppose loci A, B, C, and D are in syntenic order and qAB = 0.1, qBC = 0.2, and qCD = 0.3. • What are the probabilities of the haplotype classes given the Kosambi map function.

  30. The Linear Equations

  31. Multipoint Likelihood • Can be written in terms of the 2k-1-1 recombination values or haplotype frequencies. • Can be reparameterized as k-1 recombination fractions and 2k-1-k interference parameters. • Then tests for interference are possible. • An alternative is to assume a map function with possibly unknown parameters which constrains the gamete probabilities as functions of the k-1 recombination fractions.

  32. Multilocus-Infeasible Map Functions • Kosambi, Carter-Falconer, and Felsenstein map functions are multilocus-infeasible because they can produce negative gametic frequencies. • The Morgan, Haldane, Sturt and generalized map functions are multilocus-feasible. • Haldane is most often used for its simplicity except when linkage is tight, e.g. m << 0.5.

  33. Map Building • How many possible orders are there for k loci? • 10 loci can be ordered in over 1 million ways. • The solution is to generate a small number of probably orders and then analyze these few in depth.

  34. Stepwise Approximate Ordering • Use likelihood analysis to order a few markers, say l. • Add each additional marker one at a time by considering all l-1 positions for it. Choose the location that results in the highest likelihood. • Number of likelihood evaluations: 3+4+5...+k = (k-2)(k+3)/2.

  35. Pairwise Approximate Ordering • Two point linkage analysis on all pairs of loci to obtain a recombination fraction estimate. • Multidimensional scaling analyses (multivariate exploratory analysis) to find approximate orders.

  36. Final Step – Perfecting Order • Test the likelihood of various reorderings of neigboring groups of loci. • If an tested order has higher likelihood, keep it. • etc...

  37. Disease Mapping • Condition on an ordering of all markers except disease locus. • Calculate a multilocus likelihood for each possible position of the disease locus, call this lx. • Calculate the location score 2(lx - l¥) at point x, where l¥is the log-likelihood with disease locus unlinked to other markers.

  38. Disease Mapping • Can also calculate multipoint LOD scores by dividing locations scores by 2ln(10). • Plot location score or multipoint LOD score by position x. The peak is the likely position of the disease locus and if the peak exceeds some cut-off criteria linkage to that region is significant.

  39. Multipoint vs. Single Point Disease Mapping • Information from every sampled individual, even those who may be homozygous at the single marker. • Single marker can only provide information about crossovers on one side of the disease gene. • The more markers, the sharper the peak. • The disease gene is ultimately mapped to the smallest interval where there is no observed crossover between marker and disease gene in entire sample.

  40. Sample Size • Assuming no interference, crossovers are distributed exponentially with mean 1 per Morgan. • Sample n individuals and the mean rate is n. • Therefore, the expected distance to the nearest crossover on either side of the disease locus is 1/n. • The interval containing disease gene has length distributed as gamma distribution with mean 2/n. • Example: You want to localize disease gene to 1 cM = 1/100 M. Therefore, you need n>200.

  41. Summary • Modeling of segregation distortion and the impact on linkage analysis. • Haplotying coding. • The use of map functions. • Overview of likelihood formulation for multipoint analysis.

More Related