Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants

Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants Journal club (Nov/13) SH Lee

Introduction • Sequence data • Rare and unidentified variants • Groupwise association tests • Omnibus tests • Burden test, CMC test, SKAT test • Up-weighting for rare, • down-weighting for common • Rare/common variants tested separately

Introduction • This study develops a joint test of rare/common • Combining burden/SKAT test for rare/common • Can be applied to • whole exome sequencing + GWAS • Deep resequencing of GWAS loci • Basically can analyse all variants including rare, low-frequency and common variants • Simulation (type 1 error, power) • Real data, CD and Autism

Materials and Methods Definition of rare/common • <0.01 rare • 0.01-0.05 low frequency • >0.05 common Or • <1/sqrt(2*n) rare • >1/sqrt(2*n) common • n = 500, rare MAF < 0.031 • n = 10000, rare MAF < 0.007

Materials and Methods • Testing for the overall effect of rare and common variants • Rare for Burden test • Common for SKAT test • Weighted-sum statistics • Fishers method of combining the p values

Weighted-sum statistics • Within a region (e.g. a gene) having m variants • g(*) is a linear or logistic link function • Alpha is for covariates • X is n x m matrix • Beta is regression coefficient and random variable

Weighted sum score test(Variance component score test) Taking the first derivative of log-likelihood respect with the variance τ P-value from κχ2ν κis scale parameter, v is degree of freedom

Weighted sum score test(Variance component score test) Wu et al (2010) AJHG 86: 929; Liu et al (2008) BMC Bioinformatics 8: 292; Lin (1997) Biometrika 84: 309; White (1982) Econometrica 50: 1

Weighted sum score test(Variance component score test) • ρ : the correlation between regression coefficients • If perfectly correlated (ρ= 1), they will be all the same after weighting, and one should collapse the variants first before running regression, i.e., the burden test • If the regression coefficients are unrelated to each other, one should use SKAT Lee et al. (2012) AJHG 91: 224

Burden-C, SKAT-C • Combined test statistic for rare and common • Weighting beta(p,1,25) for rare, • beta(p,0.5,0.5) for common • Partitioning rare and common variants

Other methods • Burden-A, SKAT-A • Adaptive combining rare/common • Searching φ for the minimum p-value • Burden-F, SKAT-F • Fisher’s combination method

Simulation • Sequence data on 10,000 haplotypes on 1 Mb region • Calibrated model for the European pop • Random sample of a region of 5 or 25 kb and simulated data with 1000-5000 individuals • Proportion of cases in the sample is 0.5

Disease model

Methods

Type I error • The proposed methods agrees with the expectation

Power (separation cut-off) • Using burden-C test • Power with different separation cut-offs • 1/sqrt(2n) will be used further

Power (proposed methods) • Power for 8 different tests • The proposed combination tests outperform

Power • Rare/common causal variants (model 1, 2, 3, 6) • The combination methods perform better

Power • Common causal variants (model 5) • The combination methods perform better • Rare causal variants (model 4) • The combination methods perform similarly

Power (proposed methods) • The proposed combination methods outperform CMC for all 6 disease models • The proposed combination methods outperform the original SKAT for all 6 disease models

Power • For model 1-4 which include only risk variants • SKAT better than Burden when prop. risk variants is small (10%) • Burden better than SKAT when prop. risk variants is large (30%)

Power • Model 1-3 which include both rare/common • SKAT-F better than burden-F regardless of prop. risk variants • Model 5 which include only common risk variants • SKAT better than burden regardless of prop. risk variants

Power • Adaptive test (SKAT-A, Burden-A) • Perform worse than SKAT-C and Burden-C • Results for a region of size 5 kb were similar

Real data • CD NOD2 sequence data • 453 cases, 103 controls • 60 single nucleotide variations (9 of them have > MAF 0.05) • Because only pooled frequency counts available for each variants, sequencing data were simulated. • Autism LRP2 sequencing data • 430 cases, 379 controls

Real data • The combination methods powerful than others

Discussion • The proposed combination methods • Partitioning rare/common • Powerful approach • Better than CMC (rare/common partitioning) • Better than original Burden and SKAT test • Extend to family-based designs

Discussion • T1D HLA region • SKAT (2.7e-43) • Wald test (6.7e-49) • Likelihood ratio test (8.9e-221) • LD between regions • Multiple different components within a region

Thanks

Linear SKAT vs individual variant test statistics • Linear SKAT (lower) and individual variant test (upper) is equivalent

Three disease model for power comparison

Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants