600 likes | 827 Views
Family-Based Association Tests. “If you cannot get rid of the family skeleton, you may as well make it dance” (G.B. Shaw). Outline. Overview Trios: Transmission Disequilibrium Test (TDT) Discordant sibships: Conditional logistic regression General Pedigree: FBAT test
E N D
Family-Based Association Tests “If you cannot get rid of the family skeleton, you may as well make it dance” (G.B. Shaw)
Outline • Overview • Trios: Transmission Disequilibrium Test (TDT) • Discordant sibships: Conditional logistic regression • General Pedigree: FBAT test • Comparisons and extensions
Family-based designs • Discordant sibpairs, sibships • Affected offspring and their parents • Trios (2 parents, child) common design • Complex nuclear families • Extended pedigrees • Leftovers from linkage (next lecture)
Adjusting for PC’s/AIMs does well in practice, now Test for HWE in controls More powerful in most other situations More careful selection of good controls (sort of) Family-based vs. Case-control • Completely robust to population substructure • Robust to HWE failure • More powerful for very rare highly penetrant diseases (e.g., arguments coming back for sequencing) • Pseudo-controls (e.g., longevity study…), but much harder to recruit (esp. late onset diseases, children generally not difficult)
Cryptics, maybe Standard regression methods Family-based vs. Case-control • Detect genotyping error (Mendel error) • More complex analysis (but doable)
Mendel’s laws • Recall the playing cards example... • One allele from each parent for each gene • Many family based tests based on this, rather than estimating allele frequencies (case-control)
Mendelian transmission: Ex • E.g., parents are Aa, Aa: • P(offspring=AA | Mother=Aa,Father=Aa)=? • P(offspring=Aa | Mother=Aa,Father=Aa)=? • P(offpsring=aa | Mother=Aa, Father=Aa)=?
Mendelian transmission: Ex • E.g., parents are Aa, Aa: • P(offspring=AA | Mother=Aa,Father=Aa)=1/4 • P(offspring=Aa | Mother=Aa,Father=Aa)=1/2 • P(offpsring=aa | Mother=Aa, Father=Aa)=1/4 Conditioning on parents...
Mendelian transmission: Ex • E.g., parents are AA, Aa: • P(offspring=AA | Mother=AA,Father=Aa)=? • P(offspring=Aa | Mother=AA,Father=Aa)=? • P(offpsring=aa | Mother=AA, Father=Aa)=?
Mendelian transmission: Ex • E.g., parents are AA, Aa: • P(offspring=AA | Mother=AA,Father=Aa)=1/2 • P(offspring=Aa | Mother=AA,Father=Aa)=1/2 • P(offpsring=aa | Mother=AA, Father=Aa)=0
Mendelian transmission: Ex • E.g., parents are AA, AA: • P(offspring=AA | Mother=AA,Father=AA)=? • P(offspring=Aa | Mother=AA,Father=AA)=? • P(offpsring=aa | Mother=AA, Father=AA)=?
Mendelian transmission: Ex • E.g., parents are AA, AA: • P(offspring=AA | Mother=AA,Father=AA)=1 • P(offspring=Aa | Mother=AA,Father=AA)=0 • P(offpsring=aa | Mother=AA, Father=AA)=0 Homozygote parents are “non-informative” (no variation in offspring’s conditional genotype distribution)
Outline • Overview • Trios: Transmission Disequilibrium Test (TDT) • Discordant sibships: Conditional logistic regression • General Pedigree: FBAT test • Comparisons and extensions
Trios: Transmission Disequilibrium Test (TDT) • Test based on transmissions from parents to offspring • Assumptions • Parents’ and offspring genotypes known • dichotomous phenotype (though Q-TDT), only affected offspring • Count transmissions from heterozygote parents, and compare to expected transmissions • Mendel’s laws of segregation (previous slides), not control group • test for over/under-transmission of alleles in cases (intuition…) • Conditional test • offspring affection status • Parental genotypes (conditions out allele frequencies, which is what case-control is based on testing) Spielmen et al., AJHG 1993
Trios: Transmission Disequilibrium Test (TDT) • w AA parents (transmit one A, do not transmit other A) • z aa parents (transmit one a, do not transmit other a) • x Aa parents that transmit A, do not transmit a • y Aa parents that transmit a, do not transmit A Non-transmitted parental allele Transmitted parental allele
Possible Parental Configurations • AA-AA, AA-Aa, AA-aa, Aa-AA, Aa-Aa, Aa-aa, aa-AA, aa-Aa, aa-aa • (Ones not bolded are symmetric for what we will do next, e.g., AA-Aa == Aa-AA • Six possible configurations
Both parents homozygous • Offspring genotype is deterministic, no variation, not informative! Non-transmitted parental allele AA-AA | AA Transmitted parental allele
Both parents homozygous • Offspring genotype is deterministic, no variation, not informative! Non-transmitted parental allele aa-aa | aa Transmitted parental allele
Both parents homozygous • Offspring genotype is deterministic, no variation, not informative! Non-transmitted parental allele AA-aa | Aa Transmitted parental allele
One parent heterozygous • Variation from one parent Non-transmitted parental allele AA-Aa | AA,Aa .5 .5 ← Pr Transmitted parental allele Non-transmitted parental allele Transmitted parental allele
One parent heterozygous • Variation from one parent Non-transmitted parental allele Aa-aa | Aa,aa .5 .5 ← Pr Transmitted parental allele Non-transmitted parental allele Transmitted parental allele
Both parents heterozygous • Variation from both parents Non-transmitted parental allele Aa-Aa | AA,Aa,aa .5 .5 ← Pr Transmitted parental allele Non-transmitted parental allele Non-transmitted parental allele Transmitted parental allele Transmitted parental allele
Trios: Transmission Disequilibrium Test (TDT) • w AA parents (transmit one A, do not transmit other A) • z aa parents (transmit one a, do not transmit other a) • x Aa parents that transmit A, do not transmit a • y Aa parents that transmit a, do not transmit A Non-transmitted parental allele Transmitted parental allele
Transmission Disequilibrium Test (TDT) • No variation in w or z (recall homozygous parents non informative) • (x-y)2/(x+y) ~ 12; it’s just special case of McNemar’s test • Think of it as testing are there an excess of the A allele in the affected offspring than would happen by Mendel's laws? Non-transmitted parental allele Transmitted parental allele
Transmission Disequilibrium Test (TDT) • Example from the text: 94 families, 78 parents transmit allele A, 46 transmit allele a • (78-46)2/(78+46)=8.26, p-value=0.004 Insulin Dependent Diabetes Mellitus (IDDM) Non-transmitted parental allele Transmitted parental allele Spielman et al., 1993
Limitations of TDT • Only affected offspring • Only dichotomous phenotypes • Bi-allelic markers • Additive genetic model • No missing parents • Incorporating siblings assumes no linkage (more next time) • Can’t do multiple markers, multiple phenotypes
Key features of the TDT • Random variable in analysis is offspring genotype • Parental genotypes fixed • Trait fixed (condition on affected offspring)
Outline • Overview • Trios: Transmission Disequilibrium Test (TDT) • Discordant sibships: Conditional logistic regression • General Pedigree: FBAT test • Extra
Discordant sibships • Conditional logistic regression • P(Y1=1|Y1+Y2=1,g1,g2,…) • Matching each sib together, conditions on the fact that they have discordant phenotypes • Standard model for disease as in logistic regression, just matching based on family strata • Can also use FBAT framework • Similar power for main effects • Greater power for GxE (Witte, AJE 1999; Chatterjee et al., Gen Epi 2005; Hoffmann et al., Biometrics 2011) • You will go through an example in the homework
Outline • Overview • Trios: Transmission Disequilibrium Test (TDT) • Discordant sibships: Conditional logistic regression • General Pedigree: FBAT test • Comparisons and extensions
FBAT: More general methodology • Maintains general principals of TDT • Other genetic models (dominant, recessive, …) • Additional siblings, extended pedigrees, missing parents • Multiple markers, (haplotypes) • Test statistic intuition: covariance between offspring trait and genotype
FBAT: Extending TDT to more general families • For the moment, assume parents are genotyped • Let i index across families, j offspring • Score test of f({offspring genotype}ij|traitij,parentsi),use Mendel’s laws, Bayes rule • U=i,j (traitij-offset) x ({offspring genotype}ij - E[{offspring genotype}ij|parentsi]) • Assume trait is continuous or binary • Assume offset is mean (continuous) or population prevalence (dichotomous) • Condition on Parents (avoid specification of allele distribution) • Condition on offspring phenotypes (avoid specification of trait distribution)
FBAT: Extending the TDT to more general families (cont.) • U=i,j (traitij-offset) x ({offspring genotype}ij - E[{offspring genotype}ij|parentsi]) • Intuition: Like a sample covariance between trait and genotype • ZFBAT=U/sqrt(var(U)) ~ N(0,1)
FBAT: Extending the TDT to more general families (cont.) • U=i,j (traitij-offset) x ({offspring genotype}ij -E[{offspring genotype}ij|parentsi]) • Let oij={offspring genotype}ij • Let Pi=parentsi • E[oij|Pi] = X(AA)P(oij=AA|Pi) + X(AA)P(oij=AA|Pi) + X(AA)P(oij=AA|Pi) • Essentially using Mendel’s laws, as we calculated earlier
FBAT computations • X = Additive coding of A alleles • Parents AA, aa: E(X|P) = 0*P(AA|P)+1*P(Aa|P)+2*P(aa|P) = 0*0+1*1+2*0=1 • Child: • X Pr(X) (X-E(X|P)) • 1 1 0 • Parents Aa, Aa (E(X|P)=0*(1/4)+1*(1/2)+2*(1/4)=1 • Child • X Pr(X) (X-E(X|P)) • 0 1/4 1/4 • 1 1/2 0 • 2 1/4 1/4 • (Over/under-transmissions) AA-aa | Aa Uninformative families still contribute nothing! Aa-Aa | AA,Aa,aa
Seem familiar? FBAT=TDT • If Y=affection status (1=affected, 0=unaffected), offset=0, then FBAT==TDT • Similarly conditional logistic regression roughly equivalent to TDT in terms of power for main effects
FBAT offset for dichotomous traits • If all offspring are affected, then it does not matter • For rare diseases, affected most informative • For more common, can get some information from unaffecteds • Population prevalence, allows one to gain a little information from unaffecteds
Offset choice Disease prevalence K = 0.05, allele frequency of the disease gene p=0.05, attributable fraction of the disease due to carrying at least one disease gene AF=0.3, significance level α=10−4 and sample size 100 Lange and Laird (2002) Disease prevalence K=0.3, allele frequency of the disease gene p=0.143, attributable fraction of the disease due to carrying at least one disease gene AF=0.25, significance level α=0.01 and sample size 100.
FBAT offset for continuous traits • The trait mean • (Optimal choice is E(Y), depends on ascertainment) • Residual from the trait adjusted for covariates • e.g., regress gender on bmi, use residual • Suppose Y is your phenotype of interest, Z covariate • Linear regression Y = 0 + 1Z • Compute residual R=Y- (0 + 1Z) • Use R as trait in FBAT
Continuous vs. Dichotomous trait • Modeling as continuous trait -- more powerful • With highly selected traits, dichotomizing may be preferable • Using mean for offset is a poor choice here • Results very sensitive to offset choice • Dichotomizing will lose power compared to best offset choice
Offset general comments • Very poor choice -- poor power • More complicated slightly more efficient offsets are also available
Childhood asthma management program (CAMP) example • 696 trios • bi-allelic locus in IL13 gene • five groups of 22 quantitative phenotypes
Can also do a multi-marker (gene-based) test... DeMeo Gen Epi, 2006
Obesity GWAS example • BMI follow-up for 24 years • 86,604 SNPs • 694 participants • One of the first GWAS successes
GWAS example uses clever screening approach, longitudinal phenotype data...
Obesity example: Screening based on “conditional mean model” • Prioritizes SNPs based on modeling X imputed from parental genotypes (PBAT software) • f(X,P)=f(X|P)f(P) • Screening not robust to population substructure, but later testing is (so doesn’t matter)