1 / 59

Family-Based Association Tests

Family-Based Association Tests. Thomas Hoffmann. “If you cannot get rid of the family skeleton, you may as well make it dance” (G.B. Shaw). Outline. Overview Transmission Disequilibrium Test (TDT) FBAT test framework (stands for Family-Based Association Test)

craig
Download Presentation

Family-Based Association Tests

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Family-Based Association Tests Thomas Hoffmann

  2. “If you cannot get rid of the family skeleton, you may aswell make it dance” (G.B. Shaw)

  3. Outline • Overview • Transmission Disequilibrium Test (TDT) • FBAT test framework (stands for Family-Based Association Test) • Other approaches for family-based tests • including your homework!

  4. Family-based designs • Discordant sibpairs, sibships • Affected offspring and their parents • Trios (2 parents, child) common design • Complex nuclear families • Extended pedigrees • Leftovers from linkage

  5. Family-based vs. Case-control

  6. Completely robust to population substructure Robust to HWE failure More powerful for very rare highly penetrant diseases (e.g., arguments coming back for sequencing) Pseudo-controls (e.g., longevity study…), but much harder to recruit (esp. late onset diseases, children generally not difficult) Adjusting for PC’s/AIMs does well in practice, now Test for HWE in controls More powerful in most other situations More careful selection of good controls Family-based vs. Case-control

  7. Detect genotyping error (Mendel error) More complex analysis (but doable) Cryptics, maybe Standard regression methods Family-based vs. Case-control

  8. Mendel’s laws • Recall the playing cards example... • One allele from each parent for each gene • Many family based tests based on this, rather than estimating allele frequencies (case-control)

  9. Mendelian transmission: Ex • E.g., parents are Aa, Aa: • P(offspring=AA | Mother=Aa,Father=Aa)=? • P(offspring=Aa | Mother=Aa,Father=Aa)=? • P(offpsring=aa | Mother=Aa, Father=Aa)=?

  10. Mendelian transmission: Ex • E.g., parents are Aa, Aa: • P(offspring=AA | Mother=Aa,Father=Aa)=1/4 • P(offspring=Aa | Mother=Aa,Father=Aa)=1/2 • P(offpsring=aa | Mother=Aa, Father=Aa)=1/4 Conditioning on parents...

  11. Mendelian transmission: Ex • E.g., parents are AA, Aa: • P(offspring=AA | Mother=AA,Father=Aa)=? • P(offspring=Aa | Mother=AA,Father=Aa)=? • P(offpsring=aa | Mother=AA, Father=Aa)=?

  12. Mendelian transmission: Ex • E.g., parents are AA, Aa: • P(offspring=AA | Mother=AA,Father=Aa)=1/2 • P(offspring=Aa | Mother=AA,Father=Aa)=1/2 • P(offpsring=aa | Mother=AA, Father=Aa)=0

  13. Mendelian transmission: Ex • E.g., parents are AA, AA: • P(offspring=AA | Mother=AA,Father=AA)=? • P(offspring=Aa | Mother=AA,Father=AA)=? • P(offpsring=aa | Mother=AA, Father=AA)=?

  14. Mendelian transmission: Ex • E.g., parents are AA, AA: • P(offspring=AA | Mother=AA,Father=AA)=1 • P(offspring=Aa | Mother=AA,Father=AA)=0 • P(offpsring=aa | Mother=AA, Father=AA)=0 Homozygote parents are “non-informative” (no variation in offspring’s conditional genotype distribution)

  15. Outline • Overview • Transmission Disequilibrium Test (TDT) • FBAT test framework (stands for Family-Based Association Test) • Other approaches for family-based tests • including your homework!

  16. Transmission Disequilibrium Test (TDT) • Test based on transmissions from parents to offspring • Assumptions • Parents’ and offspring genotypes known • dichotomous phenotype (though Q-TDT), only affected offspring • Count transmissions from heterozygote parents, and compare to expected transmissions • Mendel’s laws of segregation (previous slides), not control group • test for over/under-transmission of alleles in cases (intuition…) • Conditional test • offspring affection status • Parental genotypes (conditions out allele frequencies, which is what case-control is based on testing) Spielmen et al., AJHG 1993

  17. Transmission Disequilibrium Test (TDT) • w AA parents (transmit one A, do not transmit other A) • z aa parents (transmit one a, do not transmit other a) • x Aa parents that transmit A, do not transmit a • y Aa parents that transmit a, do not transmit A Non-transmitted parental allele Transmitted parental allele

  18. Transmission Disequilibrium Test (TDT) • No variation in w or z (recall homozygous parents non informative) • (x-y)2/(x+y) ~ 12; it’s just special case of McNemar’s test Non-transmitted parental allele Transmitted parental allele

  19. Transmission Disequilibrium Test (TDT) • Example from the text: 94 families, 78 parents transmit allele A, 46 transmit allele a • (78-46)2/(78+46)=8.26, p-value=0.004 Non-transmitted parental allele Transmitted parental allele

  20. Limitations of TDT • Only affected offspring • Only dichotomous phenotypes • Bi-allelic markers • Additive genetic model • No missing parents • Incorporating siblings assumes no linkage (more next time) • Can’t do multiple markers, multiple phenotypes

  21. Key features of the TDT • Random variable in analysis is offspring genotype • Parental genotypes fixed • Trait fixed (condition on affected offspring)

  22. Outline • Overview • Transmission Disequilibrium Test (TDT) • FBAT test framework (stands for Family-Based Association Test) • Other approaches for family-based tests • including your homework!

  23. FBAT: More general methodology • Maintains general principals of TDT (previous slide) • Other genetic models (dominant, recessive, …) • Additional siblings, extended pedigrees, missing parents • Multiple markers, (haplotypes) • Test statistic intuition: covariance between offspring trait and genotype

  24. FBAT: Extending TDT to more general families • For the moment, assume parents are genotyped • Let i index across families, j offspring • Score test of f({offspring genotype}ij|traitij,parentsi),use Mendel’s laws, Bayes rule • U=i,j (traitij-offset) x ({offspring genotype}ij - E[{offspring genotype}ij|parentsi]) • Assume trait is continuous or binary • Assume offset is mean (continuous) or population prevalence (dichotomous) • Condition on Parents (avoid specification of allele distribution) • Condition on offspring phenotypes (avoid specification of trait distribution)

  25. FBAT: Extending the TDT to more general families (cont.) • U=i,j (traitij-offset) x ({offspring genotype}ij - E[{offspring genotype}ij|parentsi]) • Intuition: Like a sample covariance between trait and genotype • ZFBAT=U/sqrt(var(U)) ~ N(0,1) • (Can also use permutation based p-values)

  26. FBAT: Extending the TDT to more general families (cont.) • U=i,j (traitij-offset) x ({offspring genotype}ij -E[{offspring genotype}ij|parentsi]) • Let oij={offspring genotype}ij • Let Pi=parentsi • E[oij|Pi] = X(AA)P(oij=AA|Pi) + X(AA)P(oij=AA|Pi) + X(AA)P(oij=AA|Pi) • Essentially using Mendel’s laws, as we calculated earlier

  27. FBAT computations • X = Additive coding of A alleles • Parents AA, BB: E(X|P) = 0*P(AA|P)+1*P(AB|P)+2*P(BB|P) = 0*0+1*1+2*0=1 • Child: • X Pr(X) (X-E(X|P)) • 1 1 0 • Parents AB, AB (E(X|P)=0*(1/4)+1*(1/2)+2*(1/4)=1 • Child • X Pr(X) (X-E(X|P)) • 0 1/4 1/4 • 1 1/2 0 • 2 1/4 1/4 • (Over/under-transmissions)

  28. Special case when FBAT=TDT • Y=affection status (1=affected, 0=unaffected), offset=0 • Then, if only affected offspring, • U= (traitij-offset) x ({offspring genotype}ij - E[{offspring genotype}ij|parentsi]) = ({offspring genotype}ij - E[{offspring genotype}ij|parentsi]) • ZFBAT2=TDT2

  29. FBAT - Informative Families • U=i,j (traitij-offset) x ({offspring genotype}ij - E[{offspring genotype}ij|parentsi]) • ZFBAT=U/sqrt(var(U)) ~ N(0,1) • Two homozygous parents contribute nothing • Double heterozygous parents contribute -1,0,1 to U and variance of 1/2 • Single heterozygous parents contribute -1/2,1/2 to U and variance of 1/4

  30. FBAT offset for dichotomous traits • If all offspring are affected, then it does not matter • For rare diseases, affected most informative • For more common, can get some information from unaffecteds • Population prevalence, allows one to gain a little information from unaffecteds

  31. Offset choice Disease prevalence K = 0.05, allele frequency of the disease gene p=0.05, attributable fraction of the disease due to carrying at least one disease gene AF=0.3, significance level α=10−4 and sample size 100 Lange and Laird (2002) Disease prevalence K=0.3, allele frequency of the disease gene p=0.143, attributable fraction of the disease due to carrying at least one disease gene AF=0.25, significance level α=0.01 and sample size 100.

  32. Offset choice

  33. FBAT offset for continuous traits • The trait mean • (Optimal choice is E(Y), depends on ascertainment) • Residual from the trait adjusted for covariates • e.g., regress gender on bmi, use residual • [technically PBAT uses linear regression even for dichotomous traits]

  34. Continuous vs. Dichotomous trait • Modeling as continuous trait -- more powerful • With highly selected traits, dichotomizing may be preferable • Using mean for offset is a poor choice here • Results very sensitive to offset choice • Dichotomizing will loose power compared to best offset choice

  35. Covariate adjustment • Suppose Y is your phenotype of interest • Z covariate • Linear regression Y = 0 + 1Z • Compute residual R=Y- (0 + 1Z) • Use R as trait in FBAT

  36. Offset general comments • Very poor choice -- poor power • More complicated slightly more efficient offsets are also available

  37. Extend to multiple alleles or multiple traits • Multiple alleles, X is a vector • Multiple traits, (offset-trait), trait is a vector • So U=i,j (traitij-offset) x ({offspring genotype}ij - E[{offspring genotype}ij|parentsi]) becomes a vector • Var(U) just becomes covariance matrix, multi-degree of freedom 2 test: UTvar(U)-1U

  38. Childhood asthma management program (CAMP) example • 696 trios • bi-allelic locus in il13 • six groups of 22 quantitative phenotypes

  39. DeMeo Gen Epi, 2006

  40. DeMeo Gen Epi, 2006

  41. Haplotypes • Test haplotypes • Alternative to multi-marker test • Haplotype: testing phased sets of markers • Methods of testing • Test one haplotype vs all others • Variable for each haplotype • Although 2k possible, much less in practice • Phasing much easier when you have parents • Aside: generally considered specially in pre-phasing part of imputation (Delaneau, 2011, SHAPE-IT)

  42. Obesity GWAS example • BMI follow-up for 24 years • 86,604 SNPs • 694 participants • One of the first GWAS successes

  43. GWAS example uses clever screening approach, longitudinal phenotype data...

  44. Obesity example: Longitudinal phenotype

  45. Obesity example: Screening based on “conditional mean model” • Prioritizes SNPs based on modeling X imputed from parental genotypes (PBAT software) • f(X,P)=f(X|P)f(P) • Screening not robust to population substructure, but later testing is (so doesn’t matter)

  46. Obesity example: Results

  47. Screening based on “conditional power”... • Started with only analyze “top k” (Lange et al • Criticized, not looking at all SNPs, and in practice... • Prior distribution for type I error (Iulianna et al, AJHG 2007) • Bayesian (Naylor et al, Gen Epi 2010)

  48. Extending to missing parents, Extended Pedigrees • Takehome message: It can be done • Dudbridge 2008: Use a model for missing parent, not completely robust to population substructure • Rabinowitz and Laird 2000: Very technical, complicated: Condition on sufficient statistic for parental Mendelian transmissions (i.e., family genotype configuration) • Be careful, cannot just fill in parental genotypes if could be resolved, e.g., if mom is AA, dad is missing, and offspring are (AA,Aa), even though this means that the dad is Aa, you cannot just fill that in and use usual Mendelian transmissions (because used offspring) • Lots of confusions on this in the field, a series of papers getting incorrectly significant results

  49. X chromosome • Same as before, but distribution of X depends on sex of offspring • Males only one X chromosome • fathers non-informative • power for X generally much lower • Aside: data formatting --generally all males coded as homozygous • Aside: test homozygousity on X to check “genotype” gender of an individual

  50. Further extensions • Gene-environment interaction (also joint test) • Gene-gene interaction • Longitudinal traits • Survival analysis • Haplotypes (easier to phase if have parents), though often use multi-marker tests • ...

More Related