320 likes | 519 Views
Robust and powerful sibpair test for rare variant association. Sebastian Zöllner University of Michigan. Acknowledgements. Keng -Han Lin. Matthew Zawistowski. Mark Reppell. Rare Variants –Why Do We Care?. GWAS have been successful.
E N D
Robust and powerful sibpair test for rare variant association Sebastian Zöllner University of Michigan
Acknowledgements Keng-Han Lin Matthew Zawistowski Mark Reppell
Rare Variants –Why Do We Care? • GWAS have been successful. • Only some heritability is explained by common variants. • Uncommon coding variants (maf 5%-0.5%) explain less. • Rare variants could explain some ‘missing’ heritability. • Better Risk prediction. • Rare variants may identify new genes. • Rare exonic variants may be easier to annotate functionally and interpret.
Burden/Dispersion Tests • Testing individual variants is unfeasible. • Limited power due to small number of observations. • Multiple testing correction. • Alternative: Joint test. • Burden test (CMAT, Collapsing, WSS) • Dispersion test (SKAT, C-alpha)
Challenges of Rare Variant Analysis • Gene-based tests have low power. • Nelson at al (2010) estimated that 10,000 cases & 10,000 controls are required for 80% power in half of the genes. • Large sample size required • More heterogeneous sample =>Danger of stratification • Stratification may differ from common variants in magnitude and pattern.
Stratification in European Populations • (202 genes, n=900/900, • MAF < 1%, Nonsense/nonsynonymous variants)
Variant Abundance across Populations African-American Southern Asia South-Eastern Europe South-Western Europe Western Europe Central Europe North-Western Europe Eastern Europe Northern Europe Expected Number of variants per kb Finland • A gradient in diversity from Southern to Northern Europe Sample Size
Allele Sharing • Measure of rare variant diversity. • Probability of two carriers of the minor alleles being from different populations (normalized). Median EU-EU: 0.71 Median EU-EU: 0.86 Median EU-EU: 0.98
General Evaluation of Stratification • Select 2 populations. • Select mixing parameter r. • Sample 30 variants from the 202 genes. • Calculate inflation based on observed frequency differences.
Inflation by Mixture Proportion Zawistowski et al. 2014
Family-based Test against Stratification • If multiple affected family members are collected, it may be more powerful to sequence all family members. • Family-based tests can be robust against stratification. • TDT-Type tests are potentially inefficient. • How to leverage low frequency? • Low frequency risk variants should me more common in cases. • And even more common on chromosomes shared among many cases.
Family Test • Consider affected sibpairs. • Estimate IBD sharing. • Compare the number of rare variants on shared (solid) and non-shared chromosomes (blank). • Any aggregate test can be applied. S=0 S=2 S=1
Basic Properties • Twice as many non-shared as shared chromosomes. • Null hypothesis determines test: Shared alleles : Non-shared alleles=1:2 Test for linkage or association Shared alleles : Non-shared alleles= Shared chromosomes : Non-shared chromosomes Test for association only
Haplotypes not required • IBD sharing is known. • Individuals don’t need phase to identify shared variants. • Except one configuration: IBD 1 and both sibs are heterozygous • Under null, probability of configuration 2 is allele frequency. • Under the alternative, we need to use multiple imputation. Configuration 1 +1 shared Configuration 1 +2 non-shared
Evaluation of Internal Control • Assume chromosome sharing status is known for each sibpair. • Count rare variants; impute sharing status for double-heterozygotes. • Compare number of rare variants between shared and non-shared chromosomes with chi-squared test (Burden Style). S=0 S=2 S=1
Enriching Based on Familial Risk Classic Case-Control Selected Cases Internal Control S=0 S=2 S=1
Stratification • Consider 2 populations. • p=0.01 in pop1, p=0.05 in pop2. • 1000 sibpairs for internal control design. • 1000 cases, 1000 controls for selected cases. • 1000 cases and 1000 controls for case-control. • Sample cases from pop1 with proportion . • Test for association with α=0.05.
Evaluating Study Designs • Realistic rare variant models are unknown • Typical allele frequency • Number of risk variants/gene • Typical effect size • Distribution of effect sizes • Identifiabillity of risk variants • Goal: Create a model that summarizes these unknowns into • Summed allele frequency • Mean effect size • Variance of effect size
Basic Genetic Model • Assume many loci carrying risk variants. • Risk alleles at multiple loci each increase the risk by a factor independently. • Frequency of risk variant: • Independent cases • On shared chromosome
Effect Size Model • Relative risk is sampled from distribution f with mean , variance σ2. • Simplifications: • Each risk variant occurs only once in the population. • Each risk variant on its own haplotype. • Then the risk in a random case is
Effect in Sib-pairs • To calculate the probability of having an affected sib-pair we condition on sharing S. • For S>0, the probability depends on σ2. E.g. (S=2):
Analytic Power Analysis • Select μ, σ2 and cumulative frequency f • Calculate allele frequency in cases/controls P(R|A). • Calculate allele frequency in shared/non-shared chromosomes. => Non-centrality parameter of χ2 distribution.
Minor Allele Frequency Conventional Case-Control Internal Control Selected Cases
Gene-Gene Interaction • Gene-gene interaction affects power in families. • For broad range of interaction models, consider two-locus model. • G now has alleles g1,g2. The joint effect is • We compare the effect of while adjusting L and G to maintain marginal risk.
Conclusions • Stratification is a strong confounder for rare variant tests. • Family-based association methods are robust to stratification. • Comparing rare variants between shared and non-shared chromosomes is substantially more powerful than case-control designs. • All family based methods/samples depend on the model of gene-gene interaction. Under antagonistic interaction power can be lower than a population sample.
Questions? Thank you for your attention