330 likes | 421 Views
Regression-based linkage analysis. Shaun Purcell, Pak Sham. ( X – Y ) 2 = 2(1 – r ) – 2 Q ( – 0.5) + . (HE-SD). ^. Penrose (1938) quantitative trait locus linkage for sib pair data Simple regression-based method squared pair trait difference
E N D
Regression-based linkage analysis Shaun Purcell, Pak Sham.
(X – Y)2 = 2(1 – r) – 2Q( – 0.5) + (HE-SD) ^ • Penrose (1938) • quantitative trait locus linkage for sib pair data • Simple regression-based method • squared pair trait difference • proportion of alleles shared identical by descent
= -2Q Haseman-Elston regression (X - Y)2 IBD 0 1 2
- - + + + - + + + - - - Sib 2 Sib 1 Expected sibpair allele sharing
Squared differences (SD) - - + + + - + + + - - - Sib 2 Sib 1
(X + Y)2 = 2(1 + r) + 2Q( – 0.5) + (HE-SS) ^ Sums versus differences • Wright (1997), Drigalenko (1998) • phenotypic difference discards sib-pair QTL linkage information • squared pair trait sum provides extra information for linkage • independent of information from HE-SD
- - + + + - + + + - - - Sib 2 Sib 1 Squared sums (SS)
SD and SS - - + + + - + + + - - - Sib 2 Sib 1
New dependent variable to increase power • mean corrected cross-product (HE-CP) • other extensions • > 2 sibs in a sibship multiple trait loci and epistasis • multivariate multiple markers • binary traits other relative classes
SD + SS ( = CP) - - + + + - + + + - - - Sib 2 Sib 1
HE-CP • Propose a weighting scheme Xu et al • With residual sibling correlation • HE-CP in power, HE-SD in power
Clarify the relative efficiencies of existing HE methods • Demonstrate equivalence between a new HE method and variance components methods • Show application to the selection and analysis of extreme, selected samples
(X – Y)2 = 2(1 – r) – 2Q( – 0.5) + HE-SD (X + Y)2 = 2(1 + r) + 2Q( – 0.5) + HE-SS XY = r+ Q( – 0.5) + HE-CP Haseman-Elston regressions
NCP per sibpair Variance of Dependent Dependent (X – Y)2 8(1 – r )2 (X + Y)2 8(1 + r )2 XY 1 + r2 NCPs for H-E regressions
Weighted H-E • Squared-sums and squared-differences • orthogonal components in the population • Optimal weighting • inverse of their variances
Weighted H-E • A function of • square of QTL variance • marker informativeness • complete information = 0.0125 • sibling correlation • Equivalent to variance components • to second-order approximation • Rijsdijk et al (2000)
Combining into one regression • New dependent variable : • a linear combination of • squared-sum • and squared-difference • weighted by the population sibling correlation:
HE-COM - - + + - + + - - - Sib 2 Sib 1
Simulation • Single QTL simulated • accounts for 10% of trait variance • 2 equifrequent alleles; additive gene action • assume complete IBD information at QTL • Residual variance • shared and nonshared components • residual sibling correlation : 0 to 0.5 • 10,000 sibling pairs • 100 replicates • 1000 under the null
Sample selection • A sib-pairs’ squared mean-corrected DV is proportional to its expected NCP • Equivalent to variance-components based selection scheme • Purcell et al, (2000)
Sibship NCP 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 4 3 2 1 0 -4 -3 Sib 2 trait -1 -2 -1 -2 0 1 -3 2 Sib 1 trait 3 -4 4 Sample selection
Analysis of selected samples • 500 (5%) most informative pairs selected r = 0.05 r = 0.60
Variance-based weighting scheme • SD and SS weighted in proportion to the inverse of their variances • Implemented as an iterative estimation procedure • loses simple regression-based framework
Product of pair values corrected for the family mean • for sibs 1 and 2 from the j th family, • Adjustment for high shared residual variance • For pairs, reduces to HE-SD
Conclusions • Advantages • Efficient • Robust • Easy to implement • Future directions • Weight by marker informativeness • Extension to general pedigrees