690 likes | 1.09k Views
定性性状的连锁研究. 李晋. 同胞对分析( Sib Pair Analysis ). IBD/IBS 介绍 受累同胞对方法. Introduction to IBD/IBS. IBD: An allele is shared by two family members (e.g. siblings, uncle-niece). It can be elucidated that this allele was transmitted from a common ancestor.
E N D
定性性状的连锁研究 李晋
同胞对分析(Sib Pair Analysis) • IBD/IBS介绍 • 受累同胞对方法
Introduction to IBD/IBS • IBD: An allele is shared by two family members (e.g. siblings, uncle-niece). It can be elucidated that this allele was transmitted from a common ancestor. Depending on the pedigree structure, this common ancestor may be a parent, grandparent, great-grandparent, etc. • IBS: An allele is shared by two family members (e.g. siblings, uncle-niece). Although the allele is of the same type , they may or may not share this allele from a common ancestral chromosome.
来源同一(identical by descent, IBD)指的是子代中共有的一段DNA区域或共有的等位基因来源于一个共同的祖先。 • 状态同一(identical by state, IBS)只考虑家系成员之间遗传标记或等位基因的相似性,而不管其是否来源于一个共同的祖先,也不需进行亲代的等位基因分型。
1 2 1 3 1 2 1 2 IBD/IBS IBD=2 IBS=2
1 1 1 2 1 2 1 2 IBD/IBS IBD=? IBS=2
IBD/IBS 1/2 1/3 1/2 1/3 1/2 1/3 1/2 1/2 1/3 1/3 1/2 1/1 IBD=0 IBS=1 IBD=1 IBS=1 IBD=2 IBS=2
1 2 1 2 1 2 1 2 IBD= IBS=2 IBD/IBS
1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 Prob 1/2 IBD=2 Prob 1/2 IBD=0
1/2 2/2 2/3 Missing Data
2/3 1/? 2/3 1/3 2/3 1/2 2/3 1/3 1/3 1/3 2/3 2/3 P(0) = 0, P(1) = 1, P(2) = 0 P(0) = 1, P(1) = 0, P(2) = 0
For our population: Population frequency of allele 2 = 0.25 Population frequency of allele 3 = 0.50 Compute the Relative Frequencies of the two possible genotypes P(Father has genotype 1/2) = 0.25/(0.50+0.25) = 0.33 P(Father has genotype 1/3) = 0.5/(0.25+0.25) = 0.67 Multiply the probability of each possible genotype by the IBD probabilities for that genotype: P(0 alleles IBD) = 0.333(0) + 0.667(1) = 0.667 P(1 allele IBD) = 0.333 (1) + 667(0) = 0.333 P(2alleles IBD) = 0.333 (0) + 0.667 (0) = 0
12 13 12 12 33 13 IBD/IBS Allele Sharing -Extended Pedigrees Uncle - Niece 1 allele IBD
Uncle - Niece 1 allele IBS 12 12 33 13 IBD/IBS Allele Sharing -Extended Pedigrees
Affected Sib-Pair Data • Parents and their affected (and unaffected) offspring are ascertained • Can ascertain sib-ships with multiple affected individuals • Only the children are phenotypes • Unaffected offspring especially useful • When one or more parent cannot be ascertained. • Unaffected sibs can also be used in the analysis. • Allele sharing is compared between affected and unaffected sibling(s). • Problems using unaffected siblings: • Reduced penetrance
Affected Sib Pair (ASP) design 1/2 2/3 • Look at inheritance of marker alleles by two affected sibs. If disease is ‘related’ to marker, expect similar genotypes in sibs. • Measure genotype similarity by number of alleles in sib 2 that are copies of same parental allele (shared IBD = identically by descent) in sib 1. 1/3 2/3 • Without linkage, expect proportion of sibships sharing 0, 1 or 2 alleles IBD in ratio 1:2:1. • Each parent either passes the same allele (shared IBD) or a different allele (no sharing) to the two offspring. Without linkage, expect a 50% proportion of parents whose transmitted alleles are shared IBD. With linkage, this proportion is expected in excess of 50%. • In graph, one allele (“3”) is shared, one is not shared (“2” = “1”).
Various forms of Sib Pair analyses • Test goodness of fit to IBD = 0, 1, 2 (chi-square with 2 df). • Mean test: Determine mean number of alleles shared among affected siblings and test for a significant increase over the expected value of 1. • Propotion test: • Multi-marker approaches: Genehunter 2.0 program (previously implemented in Mapmaker/sibs) uses information from all markers on a chromosome to obtain information on IBD sharing.
1/2 3/4 1/3 1/4 Alleles IBD = 1 Goodness of Fit • z0, z1, z2– probability that an ASP shares 0, 1, or 2 alleles IBD • Under no linkage z0=1/4, z1=1/2, z2=1/4
Goodness of Fit • Fully penetrant recessive disease (no phenocopies): z0= z1=0, z2= 1 • Fully penetrant dominant disease (no phenocopies): z0= 0, z1= z2=1/2 • Carry out goodness of fit for observed proportions zi, i=0,1,2 (2df)
Goodness of Fit Let n=total number of affected sib pairs and number of pairs with 0, 1, and 2 alleles IBD be n0, n1, n2 (n = n0 + n1 + n2) Where e0, e1, e2are n/4, n/2and n/4, respectively
Mean Test • Tests the mean number of shared alleles against the expected null value of 1. Under no linkage expect 50% allele sharing • Normal distribution – one tail test Where n = number of sib pairs, n1 = number of pairs sharing 1 allele IBD, and n2 = number of pairs sharing 2 alleles IBD
“两等位基因”检验基于受累同胞对间共享两个标记等位基因的比例,它的检验统计量为“两等位基因”检验基于受累同胞对间共享两个标记等位基因的比例,它的检验统计量为 以上两个检验,在零假设成立时 , 均渐近服从 分布,如检验统计量大于等于3.72 ( )时,拒绝零假设。
alleles or haplotypes IBD Totol pairs 137 0 1 2 Observed 10 46 81 Expected 34 69 34
Test statistic degree of freedom Goodness of fit 88.4 2 Proportions 9.22 136 Means 8.58 136
ASP方法的优点是: 1.不依赖于遗传模式; 2.计算相对简单; 3.同胞对数据相对容易得到。 • 缺点是效能较低,且除了一些特殊的情形外均未对重组率进行估计 。
Risch 亲属复发风险模型 • single-Locus model • Assume that a single locus with n alleles underlies disease susceptibility.Enumerate the alleles as g1, g2,... gn. • Let the population frequency of gi be ti for i= 1, .. ., n. • Let fij be the penetrance of genotype gigj.
define the random variable Xi to be 1 if individual 1 is affected, and 0 if unaffected; similarly, define X2 for a related individual 2 of type R. If the Hardy-Weinberg law is assumed to hold, the population prevalence is given by • Define KR = E(X2 ︱X1 =1) to be the recurrence risk for a type R relative of an affected individual.
the probability that a proband and type R relative are both affected is K x KR = E(X1X2) = Cov(Xl,X2)+ K2 • Thus, KR = K + (1/K)Cov(XlX2)
Two-Locus Models • assume that two unlinked loci are involved in disease susceptibility; again I allow for an arbitrary number of alleles and genotypes at each locus. • Denote the genotypes at the first locus by Gi, i = 1, . . ., n with corresponding population frequencies pi and those at the second locus by Hj, j = 1, . . ., m with corresponding population frequencies qj.
For a pair of relatives of a certain type R, define Tkl as the conditional probability that the relative has genotype I given that the proband has genotype k (i.e., the genotype transition probability) • let wij be the penetrance of genotype GiHj
hence, an n x m matrix W of penetrances can be defined. K, KR, and are as defined previously. Then
Multiplicative Model(倍乘互作效应模型) • The first two-locus model I consider is a multiplicative model. • the n x m matrix W can be determined by n + m parameters. • Assume that values x1, . . ., Xn and y1, . . ., ym can be defined such that the penetrance wij = xiyj.
Other references 1) Pericak-Vance MA, Bebout JL, Gaskell PC, et al. (1991): Linkage studies in familial Alzheimer’s disease: Evidence for Chromosome 19 linkage. Am J Hum Genet 48:1034-1050. (ML method) 2) Week DE, Lange K (1988): The affected-pedigree-member method of linkage analysis. Am J Hum Genet 42:315-326. (APM) 3) Davis S, Schroeder M, Doldin LR, Weeks DE (1996): Nonparametric simulation-based statistic for detecting linkage in general pedigree. Am J Hum Genet 58:867-880. (SimIBD) 4) Kruglyak L, Daly MJ, Reeve-Daly M, Lander ES (1996): Parametric and nonparametric linkage analysis: A unified multipoint approach. Am J Hum Genet 58:1347-1363 (NPL) 5) Commengens D (1994): Robust genetic linkage analysis based on a score test of homogeneity: The weight pair correlation statistic. Genetic Epidemiol 11:189-200. (WPC)
Affected Relative Pairs Methods • When the underlying genetic model cannot be specified. • Missing marker data • Extended pedigrees • Discrete phenotypes • Computational consideration • Unavailability of suitable (better) package
General Idea of ARP • to utilize procedures that rely less completely on the genetic model specification • to use statistical approaches that is less sensitive to departure from normal distribution • to employ Large Sample Theory • to gain simplicity at expense of loss of power • to make choice of "no choice"
Maximum Likelihood (Lod Score) • Affected Pedigree Member (APM) • SimIBD Analysis • NPL Analysis • WPC Analysis
Maximum Likelihood (Lod Score) • to approximate a nonparametric method by minimizing some effects, for example, modifying the penetrance parameters • to utilize part of data, for example, affected only • to assume that the affected phenotype is much more certainly associated with presence of the trait allele than the unaffected
Usefulness • In single gene disorders when the age at onset, or other penetrance function cannot be well specified • When a major gene is suspected, but cannot be proven • When large or complex pedigrees are being studied, even if the mode of inheritance is unknown • When multipoint analysis is being contemplated, since other ARP methods (below) may be limited to two-point analysis
Affected Pedigree Member (APM) • To test the deviation of some kind relative pair from expected distribution of identity-by-state (IBS), not IBD • IBS matching alleles is same size instead same origin
The APM test statistics Three common weighting functions
The overall statistics over all families where • The overall APM statistics for a given family
Advantages Disadvantages • Reliance on IBS rather than IBD • Sensitivity of APM statistics to marker allele frequencies • Throwing away potential information • Inflating false positive (Type I) error rate • Unfair weighting scheme • Without having to specify an underlying trait genetic model • Including affected relative other than just siblings • Speed • Cooperating with multipoint analysis