370 likes | 707 Views
Nonparametric Linkage Analysis. Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis. The classical method was parametric linkage analysis the Lod-score method. This method has been very successful in mapping both Mendelian disease genes and DNA markers.
E N D
Nonparametric Linkage Analysis Tutorial #10 by Ma’ayan Fishelson
Classical Method of Linkage Analysis • The classical method was parametric linkage analysis the Lod-score method. • This method has been very successful in mapping both Mendelian disease genes and DNA markers. • In order to use this method the transmission mode of all loci analyzed needs to be specified allele frequencies, penetrance probabilities, # of loci affecting the trait..
Why Is Non-Parametric Linkage Analysis Needed ? • When the disease model is complex (non-Mendelian disease) it is difficult to estimate penetrance values and allele frequencies. • It is possible that there are several loci influencing the disease, some more influential than others, some dominant and some recessive. • It is possible that different modes of transmission operate in different families (heterogeneity). • The argument is that if the transmission model is specified incorrectly the results produced by the Lod-score method are invalid this method should not be used when analyzing a disease with unknown inheritance mode.
Proposed Solution • A variety of methods for testing for linkage have been developed. • These methods are termed nonparametric or “model-free”. • When using these methods, the parameters defining the transmission model don’t need to be specified.
1 2 1/3 4/6 3 4 1/6 1/6 Affected Sib-Pair Analysis Both siblings inherited the 1 allele from the father (1) and the 6 allele from the mother (2). What is the probability for this according to Mendelian Segregation rules ?
Average No. of Shared alleles Between Sib-Pairs Under the Null Hypothesis • Assume the parental genotypes are (g1,g2) and (g3,g4). • Assume sib1 has genotype (g1,g3). • Pr(sib2 has allele g1) = Pr(sib2 has allele g2) = 0.5 • Pr(sib2 has allele g3) = Pr(sib2 has allele g4) = 0.5 • Pr(sib2 has (g1,g3)) = Pr(IBD=2) = 0.52 = 0.25 • Pr(sib2 has (g2,g4)) = Pr(IBD=0) = 0.52 = 0.25 • Pr(sib2 has (g1,g4)) = Pr(sib2 has (g2,g3)) = 0.52 = 0.25 • Pr(IBD=1) = Pr(sib2 has (g1,g4)) + Pr(sib2 has (g2,g3)) = 0.5
Average No. of Shared alleles Between Sib-Pairs • According to Mendelian Segregation rules, the probability that a pair of siblings would share: • Both marker alleles 0.25 • One marker allele 0.5 • No marker allele 0.25 • The average number of shared alleles is: (0.25 * 2) + (0.5 * 1) + (0.25 * 0) = 1 However, if the marker is linked to the disease locus the overall amount of sharing between affected sibs will be increased !
Close to disease locus (θ<<0.5): affected sibs share on average more than 1 allele IBD if the genetic component is strong. Far away (θ=0.5): affected sibs share on average 1 marker allele IBD. Affected Sib-Pair Analysis • Main Idea: if a marker is linked to a disease locus the same marker allele will be inherited by two siblings who are both ill more often than expected by chance (if they were unlinked). Disease Locus
1 2 1/4 2/4 3 4 4/2 4/1 Identity-by-Descent (IBD)Identity-by-State (IBS) • IBD Two alleles are IBD if they have the same ancestral origin. • IBS Two alleles are IBS if they are of the same type. The 2 sibs have 1 allele in common (4), but each one got it from a different parent. IBS-count : 1 IBD-count: 0
1 2 1/3 1/2 3 4 1/1 1/1 IBS ↔ IBD IBS = ? IBD = ? Both children got 1 from the father (1) and from the mother (2) 2 alleles IBS, 2 alleles IBD
1 2 1/1 2/3 3 4 1/2 1/3 IBS ↔ IBD IBS = ? IBD = ? Both children got 1 from the father (1), but we don’t know if they got the same allele 1 allele IBS, 1 or 0 alleles IBD
1 2 1/2 1/2 3 4 1/2 1/2 0, probability 0.5 IBD = 1, probability 0 2, probability 0.5 IBS ↔ IBD IBS = ? IBD = ? Both children got allele 1 and allele 2, but we don’t know from which parent they got each allele 2 allele IBS, 0 or 2 alleles IBD IBS = 2
IBD & IBS Probabilities • IBD sharing probabilities: 0.25, 0.5 and 0.25. • IBS sharing probabilities: depend on the allele frequencies of the marker, since allele pairs are more likely to be IBS for a common allele than a rare one.
Shortcomings of IBS-based Analyses • Have less power than IBD-based analyses, because parental information isn’t available. • May be biased if incorrect marker allele frequencies are used.
IBD sharing Binomial Distribution • Assume a sib-pair, and some locus.. • The probability is 0.5 that they got 2 alleles IBD from their father. The same is true for the 2 alleles received from the mother. • Let N be the total number of alleles shared IBD for the sib-pair. N can be viewed as the number of successes in two experiments, where success means the parent passes on 2 alleles IBD to the sib-pair. • The probability of success is 0.5
IBD Sharing + ASP • We now wish to compute the conditional probability of N given that the two sibs are affected and the locus is the disease locus.. • The desired probabilities are: • z0 = P(N=0|ASP) • z1 = P(N=1|ASP) • z2 = P(N=2|ASP) • These probabilities depend on the inheritance model of the disease (disease allele frequency, penetrance value…).
P( N=k): k = 0, 1, 2 ? • Possible parental disease locus genotypes: • The corresponding genotypes under the assumptions of HWE and independence between the parents:
P( N=k): k = 0, 1, 2 ? Assume a genetic model of a recessive disease with full penetrance: f =(0,0,1). Explanation: both affected sibs must have 2 disease alleles and these Must be of different parental origin…
P(N=0) = 0.000 P(N=1) = 0.002 P(N=2) = 0.998 P = 0.001 P( N=k): k = 0, 1, 2 ? In a similar way:
P( N=k): k = 0, 1, 2 ? • Assume now that the locus x is not the disease locus… • Now (z0, z1, z2) depend on how closely linked the locus x is to the disease locus. • If x is unlinked to the disease locus, the fact that the sib-pair is affected gives no extra information..
IBD Probabilities as a Function of the Distance from the Disease Locus At the disease locus: z1 = 0.15 and z2 = 0.8. Conditional IBD probabilities approach the values under H0 (z1=0.5, z2 =0.25), as the distance from the disease locus increases.
ASP Analysis • Collect affected sib-pairs. • Genotype all 4 members of each sib-ship. • Estimate the conditional IBD probabilities Ψ = (z0, z1, z2). • Compare with the IBD probabilities under the null hypothesis of no linkage: zH0 = (0.25, 0.5, 0.25).
ASP Analysis – one approach • Idea: compare between the expected and observed number of pairs sharing 0,1,and 2 alleles IBD. • Test for linkage by computing the chi-squared statistic: χ2 = (O2-E2)2/E2 + (O1-E1)2/E1 + (O0-E0)2/E0 O2,O1, and O0 are the observed no. of sib-pairs sharing 2,1, or 0 alleles. E2, E1, and E0 are the expected no. of sib-pairs sharing 2, 1, or 0 alleles. Note: there are 2 degrees of freedom.
ASP Example In a study of diabetes mellitus 119 sib-pairs are ascertained. Researches have genotyped a candidate locus (FGF3) and would like to determine if there is linkage between IDDM and FGF3. After genotyping they are able to determine that 20 of the sib pairs share 0 alleles IBD, 59 sib-pairs share1 allele IBD and 40 sib-pairs share 2 alleles IBD. Determine if there is significant evidence for linkage between IDDM and FGF3.
ASP Analysis – 2ndapproach • Idea: compare the observed average number of shared alleles to the expected number of 50%. • Test for linkage by computing the chi-squared statistic: χ2 = (2O2+O1-N)2/N + (2O0+O1-N)2/N where there are N sib-pairs. Note: there is 1 degree of freedom. # of shared alleles # of unshared alleles
Question • Siblings share 0,1, or 2 alleles IBD at a marker locus with probabilities p0=0.25, p1=0.5, and p2=0.25, respectively under the null hypothesis of no linkage, but what is the corresponding IBD-distribution for father and son ? • p0 = 0, p1 = 1, and p 2= 0; • p0 = 0, p1 = 0.5, and p 2= 0.5; • p0 = 0.25, p1 = 0.5, and p 2= 0.25; • p0 = 0.5, p1 = 0.5, and p 2= 0;
Extended Sib-Pair Analysis • If one of the affected sibs or their parents is untyped at the specific locus the probability of each possible genotype for this person at this locus can be computed, and the IBD value can be computed based on these probabilities. P(i alleles IBD) is computed as a sum of the probabilities of all the configurations of the family where i alleles are IBD for the relative pair examined.
Affected Pedigree Member (APM) Method • Aims to detect increased allele-sharing (IBS) between all affected members of a pedigree. What is the IBS status of the 2 indicated individuals ? The 2 indicated individuals are IBS for two alleles. The 2 indicated individuals are IBD for neither one of the alleles. What is their IBD status ?
APM Method – cont. • Zij – a similarity statistic between a relative pair. • Assume the genotype of the 1st relative is (A1,A2) and the genotype of the 2nd relative is (B1,B2). The general similarity statistic is computed as follows:
APM Method – Taking Gene Frequencies into Account • It is more meaningful that two relatives share a rare allele than a common one. • Therefore, weights (based on the allele frequencies) are added: Possible weight functions:
APM Method - Weaknesses • The actual null distribution may be slightly skewed, while the asymptotic null distribution is normally distributed. • The method is very sensitive to misspecification of allele frequencies. • Because IBS status is used and not IBD sharing, genotype information in unaffected individuals is ignored, and therefore its power is low compared to linkage methods that use IBD status .
SimAPM • Marker genotypes in the affected individuals are simulated conditional on the marker genotypes in the unaffecteds, to determine a conditional empirical null distribution. • Solves the first two shortcomings of the APM method. • The third shortcoming still exists….
1/2 3/3 1 2 1/3 2/3 2/3 1/3 3 4 6 5 2/3 Individuals 5 & 7 share 2 alleles IBS. What about IBD sharing ? 7 SimIBD • Uses the same principles as SimIPM, except for measuring IBD sharing (and not IBS sharing). Individuals 5, 6 & 7 are affected with the disease. IBD=1 What about Individuals 5 & 6 ?
Boundary condition 1: If X and Y are two distinct individual founders, then: Boundary condition 2: If X1 and X2 are the two different alleles of founder X, then: Boundary condition 3: If Xi is the single allele of person X, then: Recursive Algorithm for Computing P(ix≡jy | G) – Boundary Conditions The algorithm is based on 3 boundary conditions, and 2 recurrence rules.
Recursive Algorithm for Computing P(ix≡jy | G) – Recurrence Rules 1 The recurrence rules replace each allele of the child B with her parents’ alleles F1,F2,M1, and M2. Recurrence Rule 1: Given 2 individuals A and B with A≠B, then: otherwise:
Recursive Algorithm for Computing P(ix≡jy | G) – Recurrence Rules 2 Recurrence Rule 2:If B1 and B2 are two alleles in the same person, then if allele B1 is not IBS to B2, then: otherwise: