定性性状的连锁研究

定性性状的连锁研究 李晋

同胞对分析（Sib Pair Analysis） • IBD/IBS介绍 • 受累同胞对方法

Introduction to IBD/IBS • IBD: An allele is shared by two family members (e.g. siblings, uncle-niece). It can be elucidated that this allele was transmitted from a common ancestor. Depending on the pedigree structure, this common ancestor may be a parent, grandparent, great-grandparent, etc. • IBS: An allele is shared by two family members (e.g. siblings, uncle-niece). Although the allele is of the same type , they may or may not share this allele from a common ancestral chromosome.

来源同一（identical by descent, IBD）指的是子代中共有的一段DNA区域或共有的等位基因来源于一个共同的祖先。 • 状态同一（identical by state, IBS）只考虑家系成员之间遗传标记或等位基因的相似性，而不管其是否来源于一个共同的祖先，也不需进行亲代的等位基因分型。

1 2 1 3 1 2 1 2 IBD/IBS IBD=2 IBS=2

1 1 1 2 1 2 1 2 IBD/IBS IBD=? IBS=2

IBD/IBS 1/2 1/3 1/2 1/3 1/2 1/3 1/2 1/2 1/3 1/3 1/2 1/1 IBD=0 IBS=1 IBD=1 IBS=1 IBD=2 IBS=2

1 2 1 2 1 2 1 2 IBD= IBS=2 IBD/IBS

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 Prob 1/2 IBD=2 Prob 1/2 IBD=0

1/2 2/2 2/3 Missing Data

2/3 1/? 2/3 1/3 2/3 1/2 2/3 1/3 1/3 1/3 2/3 2/3 P(0) = 0, P(1) = 1, P(2) = 0 P(0) = 1, P(1) = 0, P(2) = 0

For our population: Population frequency of allele 2 = 0.25 Population frequency of allele 3 = 0.50 Compute the Relative Frequencies of the two possible genotypes P(Father has genotype 1/2) = 0.25/(0.50+0.25) = 0.33 P(Father has genotype 1/3) = 0.5/(0.25+0.25) = 0.67 Multiply the probability of each possible genotype by the IBD probabilities for that genotype: P(0 alleles IBD) = 0.333(0) + 0.667(1) = 0.667 P(1 allele IBD) = 0.333 (1) + 667(0) = 0.333 P(2alleles IBD) = 0.333 (0) + 0.667 (0) = 0

12 13 12 12 33 13 IBD/IBS Allele Sharing -Extended Pedigrees Uncle - Niece 1 allele IBD

Uncle - Niece 1 allele IBS 12 12 33 13 IBD/IBS Allele Sharing -Extended Pedigrees

Affected Sib-Pair Data • Parents and their affected (and unaffected) offspring are ascertained • Can ascertain sib-ships with multiple affected individuals • Only the children are phenotypes • Unaffected offspring especially useful • When one or more parent cannot be ascertained. • Unaffected sibs can also be used in the analysis. • Allele sharing is compared between affected and unaffected sibling(s). • Problems using unaffected siblings: • Reduced penetrance

Affected Sib Pair (ASP) design 1/2 2/3 • Look at inheritance of marker alleles by two affected sibs. If disease is ‘related’ to marker, expect similar genotypes in sibs. • Measure genotype similarity by number of alleles in sib 2 that are copies of same parental allele (shared IBD = identically by descent) in sib 1. 1/3 2/3 • Without linkage, expect proportion of sibships sharing 0, 1 or 2 alleles IBD in ratio 1:2:1. • Each parent either passes the same allele (shared IBD) or a different allele (no sharing) to the two offspring. Without linkage, expect a 50% proportion of parents whose transmitted alleles are shared IBD. With linkage, this proportion is expected in excess of 50%. • In graph, one allele (“3”) is shared, one is not shared (“2” = “1”).

Various forms of Sib Pair analyses • Test goodness of fit to IBD = 0, 1, 2 (chi-square with 2 df). • Mean test: Determine mean number of alleles shared among affected siblings and test for a significant increase over the expected value of 1. • Propotion test: • Multi-marker approaches: Genehunter 2.0 program (previously implemented in Mapmaker/sibs) uses information from all markers on a chromosome to obtain information on IBD sharing.

1/2 3/4 1/3 1/4 Alleles IBD = 1 Goodness of Fit • z0, z1, z2– probability that an ASP shares 0, 1, or 2 alleles IBD • Under no linkage z0=1/4, z1=1/2, z2=1/4

Goodness of Fit • Fully penetrant recessive disease (no phenocopies): z0= z1=0, z2= 1 • Fully penetrant dominant disease (no phenocopies): z0= 0, z1= z2=1/2 • Carry out goodness of fit for observed proportions zi, i=0,1,2 (2df)

Goodness of Fit Let n=total number of affected sib pairs and number of pairs with 0, 1, and 2 alleles IBD be n0, n1, n2 (n = n0 + n1 + n2) Where e0, e1, e2are n/4, n/2and n/4, respectively

Mean Test • Tests the mean number of shared alleles against the expected null value of 1. Under no linkage expect 50% allele sharing • Normal distribution – one tail test Where n = number of sib pairs, n1 = number of pairs sharing 1 allele IBD, and n2 = number of pairs sharing 2 alleles IBD

“两等位基因”检验基于受累同胞对间共享两个标记等位基因的比例，它的检验统计量为“两等位基因”检验基于受累同胞对间共享两个标记等位基因的比例，它的检验统计量为以上两个检验，在零假设成立时，均渐近服从分布，如检验统计量大于等于3.72 （）时，拒绝零假设。

alleles or haplotypes IBD Totol pairs 137 0 1 2 Observed 10 46 81 Expected 34 69 34

Test statistic degree of freedom Goodness of fit 88.4 2 Proportions 9.22 136 Means 8.58 136

ASP方法的优点是： 1.不依赖于遗传模式； 2.计算相对简单； 3.同胞对数据相对容易得到。 • 缺点是效能较低，且除了一些特殊的情形外均未对重组率进行估计。

Risch 亲属复发风险模型 • single-Locus model • Assume that a single locus with n alleles underlies disease susceptibility.Enumerate the alleles as g1, g2,... gn. • Let the population frequency of gi be ti for i= 1, .. ., n. • Let fij be the penetrance of genotype gigj.

define the random variable Xi to be 1 if individual 1 is affected, and 0 if unaffected; similarly, define X2 for a related individual 2 of type R. If the Hardy-Weinberg law is assumed to hold, the population prevalence is given by • Define KR = E(X2 ︱X1 =1) to be the recurrence risk for a type R relative of an affected individual.

the probability that a proband and type R relative are both affected is K x KR = E(X1X2) = Cov(Xl，X2)+ K2 • Thus, KR = K + (1/K)Cov(XlX2)

Two-Locus Models • assume that two unlinked loci are involved in disease susceptibility; again I allow for an arbitrary number of alleles and genotypes at each locus. • Denote the genotypes at the first locus by Gi, i = 1, . . ., n with corresponding population frequencies pi and those at the second locus by Hj, j = 1, . . ., m with corresponding population frequencies qj.

For a pair of relatives of a certain type R, define Tkl as the conditional probability that the relative has genotype I given that the proband has genotype k (i.e., the genotype transition probability) • let wij be the penetrance of genotype GiHj

hence, an n x m matrix W of penetrances can be defined. K, KR, and are as defined previously. Then

Multiplicative Model(倍乘互作效应模型） • The first two-locus model I consider is a multiplicative model. • the n x m matrix W can be determined by n + m parameters. • Assume that values x1, . . ., Xn and y1, . . ., ym can be defined such that the penetrance wij = xiyj.

Affected Relative Pair Analysis

Other references 1) Pericak-Vance MA, Bebout JL, Gaskell PC, et al. (1991): Linkage studies in familial Alzheimer’s disease: Evidence for Chromosome 19 linkage. Am J Hum Genet 48:1034-1050. (ML method) 2) Week DE, Lange K (1988): The affected-pedigree-member method of linkage analysis. Am J Hum Genet 42:315-326. (APM) 3) Davis S, Schroeder M, Doldin LR, Weeks DE (1996): Nonparametric simulation-based statistic for detecting linkage in general pedigree. Am J Hum Genet 58:867-880. (SimIBD) 4) Kruglyak L, Daly MJ, Reeve-Daly M, Lander ES (1996): Parametric and nonparametric linkage analysis: A unified multipoint approach. Am J Hum Genet 58:1347-1363 (NPL) 5) Commengens D (1994): Robust genetic linkage analysis based on a score test of homogeneity: The weight pair correlation statistic. Genetic Epidemiol 11:189-200. (WPC)

Affected Relative Pairs Methods • When the underlying genetic model cannot be specified. • Missing marker data • Extended pedigrees • Discrete phenotypes • Computational consideration • Unavailability of suitable (better) package

General Idea of ARP • to utilize procedures that rely less completely on the genetic model specification • to use statistical approaches that is less sensitive to departure from normal distribution • to employ Large Sample Theory • to gain simplicity at expense of loss of power • to make choice of "no choice"

Maximum Likelihood (Lod Score) • Affected Pedigree Member (APM) • SimIBD Analysis • NPL Analysis • WPC Analysis

Maximum Likelihood (Lod Score) • to approximate a nonparametric method by minimizing some effects, for example, modifying the penetrance parameters • to utilize part of data, for example, affected only • to assume that the affected phenotype is much more certainly associated with presence of the trait allele than the unaffected

Usefulness • In single gene disorders when the age at onset, or other penetrance function cannot be well specified • When a major gene is suspected, but cannot be proven • When large or complex pedigrees are being studied, even if the mode of inheritance is unknown • When multipoint analysis is being contemplated, since other ARP methods (below) may be limited to two-point analysis

Affected Pedigree Member (APM) • To test the deviation of some kind relative pair from expected distribution of identity-by-state (IBS), not IBD • IBS  matching alleles is same size instead same origin

The APM test statistics Three common weighting functions

The overall statistics over all families where • The overall APM statistics for a given family

Advantages Disadvantages • Reliance on IBS rather than IBD • Sensitivity of APM statistics to marker allele frequencies • Throwing away potential information • Inflating false positive (Type I) error rate • Unfair weighting scheme • Without having to specify an underlying trait genetic model • Including affected relative other than just siblings • Speed • Cooperating with multipoint analysis

定性性状的连锁研究

定性性状的连锁研究

Presentation Transcript