1 / 49

Association Analysis of Rare Genetic Variants

Association Analysis of Rare Genetic Variants. Qunyuan Zhang Division of Statistical Genomics Course M21-621 Computational Statistical Genetics. Rare Variants. Low allele frequency : usually less than 1% Low power : for most analyses, due to less variation of observations

mirit
Download Presentation

Association Analysis of Rare Genetic Variants

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M21-621 Computational Statistical Genetics

  2. Rare Variants Low allele frequency: usually less than 1% Low power: for most analyses, due to less variation of observations High false positive rate: for some model-based analyses, due to sparse distribution of data, unstable/biased parameter estimation and inflated p-value. 2

  3. An Example of Low Power 3 Jonathan C. Cohen, et al. Science 305, 869 (2004)

  4. An Example of High False Positive Rate(Q-Q plots from GWAS data, unpublished) N=~2500 MAF>0.03 N=~2500 MAF<0.03 N=50000 MAF<0.03 Bootstrapped N=~2500 MAF<0.03 Permuted

  5. Three Levels of Rare Variant Data Level 1: Individual-level Level 2: Summarized over subjects Level 3: Summarized over both subjects and variants 5

  6. Level 1: Individual-level 6

  7. Level 2: Summarized over subjects (by group) 7 Jonathan C. Cohen, et al. Jonathan C. Cohen, et al. Science 305, 869 (2004) Science 305, 869 (2004)

  8. Level 3: Summarized over subjects (by group) and variants (usually by gene)

  9. Methods For Level 3 Data 9

  10. Single-variant Test vs Total Freq.Test (TFT) Jonathan C. Cohen, et al. Science 305, 869 (2004)

  11. What we have learned … • Single-variant test of rare variants has very low power for detecting association, due to extremely low frequency (usually < 0.01) • Testing collective effect of a set of rare variants may increase the power (sum test, collective test, group test, collapsing test, burden test…)

  12. Methods For Level 2 Data • Allowing different samples sizes for different variants • Different variants can be weighted differently 12

  13. CAST: A cohort allelic sums test Morgenthaler and Thilly, Mutation Research 615 (2007) 28–56 Under H0: S(cases)/2N(cases)−S(controls)/2N(controls) =0 S: variant number; N: sample size T= S(cases) − S(controls)N(cases)/N(controls) = S(cases) − S∗(controls) (S can be calculated variant by variant and can be weighted differently, the final T=sum(WiSi) ) Z=T/SQRT(Var(T)) ~ N (0,1) Var(T)= Var (S(cases) − S* (controls) ) =Var(S(cases)) + Var(S* (controls)) =Var(S(cases)) + Var(S(controls)) X [N(cases)/N(controls)]^2 13

  14. C-alpha PLOS Genetics, 2011 | Volume 7 | Issue 3 | e1001322 Effect direction problem

  15. C-alpha 15

  16. QQ Plots of Existing Methods(under the null) EFT and C-alpha inflated with false positives TFT and CAST no inflation, but assuming single effect-direction Objective More general, powerful methods … EFTTFT CAST C-alpha

  17. More Generalized Methods For Level 2 Data 17

  18. Structure of Level 2 data variant 1 variant 2 … … variant 3 variant k variant i Strategy Instead of testing total freq./number, we test the randomness of all tables.

  19. Exact Probability Test (EPT) 1.Calculating the probability of each table based on hypergeometric distribution 2. Calculating the logarized joint probability (L) for all k tables 3. Enumerating all possible tables and L scores 4. Calculating p-value P= Prob.( ) ASHG Meeting 1212, Zhang

  20. Likelihood Ratio Test (LRT) Binomial distribution ASHG Meeting 1212, Zhang

  21. Q-Q Plots of EPT and LRT(under the null) EPT N=500 LRT N=500 LRT N=3000 EPT N=3000

  22. Power Comparison significance level=0.00001 Variant proportion Positive causal 80% Neutral 20% Negative Causal 0% Power Power Power Sample size Sample size Sample size

  23. Power Comparisonsignificance level=0.00001 Variant proportion Positive causal 60% Neutral 20% Negative Causal 20% Power Sample size

  24. Power Comparison significance level=0.00001 Variant proportion Positive causal 40% Neutral 20% Negative Causal 40% Power Sample size

  25. Methods For Level 1 Data • Including covariates • Extended to quantitative trait • Better control for population structure • More sophisticate model 25

  26. Collapsing (C) test Li and Leal,The American Journal of Human Genetics 2008(83): 311–321 Step 1 Step 2 logit(y)=a + b* X + e (logistic regression)

  27. Variant Collapsing

  28. WSS

  29. WSS 29

  30. WSS 30

  31. Weighted Sum Test Collapsing test (Li & Leal, 2008), wi=1 and s=1 if s>1 Weighted-sum test (Madsen & Browning ,2009), wicalculated based-on allele freq. in control group aSum: Adaptive sum test (Han & Pan ,2010), wi= -1 if b<0 and p<0.1, otherwise wj=1 KBAC (Liu and Leal, 2010), wi = left tail p value RBT (Ionita-Laza et al, 2011), wi = log scaled probability PWST p-value weighted sum test (Zhang et al., 2011) :, wi = rescaled left tail p value, incorporating both significance and directions EREC( Lin et al, 2011), wi = estimated effect size 31

  32. When there are only causal(+) variants … Collapsing (Li & Leal,2008) works well, power increased 32

  33. When there are causal(+) and non-causal(.) variants … Collapsing stillworks, power reduced 33

  34. When there are causal(+) non-causal(.) and causal (-) variants … Power of collapsing test significantly down 34

  35. P-value Weighted Sum Test (PWST) Rescaled left-tail p-value [-1,1] is used as weight 35

  36. P-value Weighted Sum Test (PWST) Power of collapsing test is retained even there are bidirectional effects 36

  37. PWST:Q-Q Plots Under the Null Direct test Inflation of type I error Corrected by permutation test (permutation of phenotype) 37

  38. Generalized Linear Mixed Model (GLMM)& Weighted Sum Test (WST) 38

  39. GLMM & WST Y : quantitative trait or logit(binary trait) α: intercept β: regression coefficient of weighted sum m: number of RVs to be collapsed wi : weight of variant i gi: genotype (recoded) of variant i Σwigi: weighted sum (WS) X: covariate(s), such as population structure variable(s) τ : fixed effect(s) of X Z: design matrix corresponding to γ γ: random polygene effects for individual subjects, ~N(0,G), G=2σ2K, K is the kinship matrix and σ2 the additive ploygene genetic variance ε: residual 39

  40. Base on allele frequency, binary(0,1) or continuous, fixed or variable threshold; Based on function annotation/prediction; SIFT, PolyPhen etc. Based on sequencing quality (coverage, mapping quality, genotyping quality etc.); Data-driven, using both genotype and phenotype data, learning weight from data or adaptive selection, permutation test; Any combination … Weight 40

  41. Adjusting relatedness in family data for non-data-driven test of rare variants. Application 1: Family Data Unadjusted: Adjusted: γ ~N(0,2σ2K) 41

  42. Q-Q Plots of –log10(P) under the Null Li & Leal’s collapsing test, ignoring family structure, inflation of type-1 error Li & Leal’s collapsing test, modeling family structure via GLMM, inflation is corrected (From Zhang et al, 2011, BMC Proc.) 42

  43. Application 2: Permuting Family Data Permuted Non-permuted, subject IDs fixed MMPT: Mixed Model-based Permutation Test Adjusting relatedness in family data for data-driven permutation test of rare variants. γ ~N(0,2σ2K) 43

  44. Q-Q Plots under the Null WSS Permutation test, ignoring family structure, inflation of type-1 error aSum PWST SPWST 44 (From Zhang et al, 2011, IGES Meeting)

  45. Q-Q Plots under the Null WSS Mixed model-based permutation test (MMPT), modeling family structure, inflation corrected aSum PWST SPWST (From Zhang et al, 2011, IGES Meeting)

  46. Burden Test vs. Non-burden Test Burden test Non-burden test T-test, Likelihood Ratio Test, F-test, score test, … SKAT: sequence kernel association test 46

  47. SKAT: sequence kernel association test

  48. Extension of SKAT to Family Data kinship matrix Polygenic heritability of the trait Residual Han Chen et al., 2012, Genetic Epidemiology

  49. Other problems • Missing genotypes & imputation • Genotyping errors & QC (family consistency, sequence review) • Population Stratification • Inherited variants and de novo mutation • Family data & linkage infomation • Variant validation and association validation • Public databases • And more … 49

More Related