580 likes | 833 Views
Beyond GWAS. Thomas Hoffmann. Recap: Where's the heritability?. Visccher, AJHG 2011. Outline. Multiple testing Gene-environment interaction Gene-gene interaction Rare variants Pharmacogenetics, Phamacogenomics. Multiple testing. Recall we are testing ~1 Million markers, more or less
E N D
Beyond GWAS Thomas Hoffmann
Recap: Where's the heritability? Visccher, AJHG 2011
Outline • Multiple testing • Gene-environment interaction • Gene-gene interaction • Rare variants • Pharmacogenetics, Phamacogenomics
Multiple testing • Recall we are testing ~1 Million markers, more or less • Several strategies to adjust the p-values for doing so many tests • Bonferroni • False Discovery Rate (FDR) • Permutation
Multiple testing - Bonferroni • Bonferroni adjustment • 0.05/{# tests, i.e., # markers, M} • most widely used in practice
Multiple testing - FDR • False Discovery Rate (FDR) limits the expected number of false positives • Use some procedure to solve for parameters in the table (e.g., simes)
Multiple testing - Permutation • Many of the tested genotype markers are correlated with each other (in LD), and so the tests are correlated • Bonferroni adjusts as if they were completely independent • Permutation will be more powerful, but… • [max(T) in plink, --mperm]
Multiple testing - Permutation • Suppose we compute the 2 statistic (recall the 2x3 table of case/control, genotype) • We can order these from smallest to largest, e.g., 2(1) 2(2) ...2(M) • Permute case control labels, recompute the order statistics: 2(1)* 2(2)* ...2(M)* • Repeat T times • 2(k)permutation={# permuted 2(k)* 2(k)}/T
Multiple testing - Permutation (continued) • … So, feasible, but more computation (1000x) • Regression similar idea, but… • Not quite as simple as permuting case-control labels, when there are covariates • More like permuting residuals
Summary: Multiple testing • Most people just use Bonferroni correction • Other methods more powerful • Laird comments (text for the course) • “Given the many false positive findings in the history of genetic association studies, one rather errs on being too conservative.”
Outline • Multiple testing • Gene-environment interaction • Gene-gene interaction • Rare variants • Pharmocogenetics, Phamacogenomics
Mental retardation and seizures 1/15,000 live births 1/100,000 in Finland 1/2,600 in Turkey PKU: Phenylketonuria
PKU: Phenylketonuria Causes Mutations in Phenylalanine Hydroxylase (PAH) Dietary Phenylalanine Both are necessary Neither is sufficient
Gene-Environment Interaction Strata Cases Controls Gene (G+) Environment (E+) a b Gene (G+) No Environment (E-) c d No Gene (G-) Environment (E+) e f No Gene (G-) No Environment (E-) g h
Gene-Environment Interaction Strata Cases Controls G+E+ a b G+E- c d G-E+ e f G-E- g h Odds Ratio (OR) ah / bg ch / dg eh / fg 1
Multiplicative Effects G+E+ 45 15 G+E- 15 5 G-E+ 9 3 G-E- 3 1 Absolute Odds Strata Risk Ratio x3 x5 x5 x3 x3 x5 P(Diseased|G,E)=P(G+)P(E+)p00
Multiplicative Effects OR Interaction = ORG+E+ / ORG+E- ORG-E+ If OR Interaction = 1, multiplicative effects Example: OR Interaction = 15 / 5 x 3 OR Interaction = 1
Factor V Leiden Mutations, Oral Contraceptive Use, and Venous Thrombosis Strata Cases Controls G+E+ 25 2 G+E- 10 4 G-E+ 84 63 G-E- 36 100 OR 34.7 6.9 3.7 Reference Total 155 169 Vanderbroucke et al., The Lancet 1994
Evidence for G-E Interaction Strata OR G+E+ 34.7 G+E- 6.9 G-E+ 3.7 G-E- Ref OR Interaction = 34.7 / 6.9 x 3.7 = 1.4
Testing for GxE in regression • P(Y=1|g,E)=0+ gX(g)+ eE+ geX(g)E • E could also be continuous... • Tricky! - Scale dependent • What if we modeled E differently, i.e. log(E) or added in E2, etc.? • Can model X(g)=(Ig=AA, Ig=AB) • Tricky! Statistical interaction biological interaction
Joint test of G, GxE • P(Y=1|g,E)=0+ gX(g)+ eE+ geX(g)E • Joint test, H0: 1=0, 3=0 • Don’t have to worry about model misspecification
Kraft, Human Heredity 2007 If ORge = 1, then G is most powerful. Case-only, don’t worry about g ge g, ge
Outline • Multiple testing • Gene-environment interaction • Gene-gene interaction • Rare variants • Pharmocogenetics, Phamacogenomics
Gene-gene interaction • Similar to gene-environment interaction, in terms of scale, etc. • epistasis: gene-gene interaction • compositional epistasis: effect of one genetic factor is masked unless another one is also present
Gene-gene interaction • P(Y=1|g1,g2)=0 + 1X(g1) + 2X(g2) + 12X(g1) X(g2) • Usually test when g1 is from one gene, and g2 from another gene • plink: --fast-epistasis • feasible: “4.5 billion two-locus tests generated from a 100K data set took just over 24 hours to run” (http://pngu.mgh.harvard.edu/~purcell/plink/)
Gene-Gene Interaction Models Marchini et al. Nature Genetics 2005
GWAS of Psoriasis Strange et al. Nature Genetics 2010
Gene-Gene Interaction Strange et al. Nature Genetics 2010
Outline • Multiple testing • Gene-environment interaction • Gene-gene interaction • Rare variants • Pharmocogenetics, Phamacogenomics
Minor Allele Frequency • Common: MAF > 0.05 • Less common: 0.05>MAF>0.01 • Rare: 0.01<MAF • SNP: MAF>0.01 (Single Nucleotide Polymorphism) • SNV: MAF<0.01 (Single Nucleotide Variant)
Rare variants • Previous GWAS focused on chips designed for MAF > 0.05 (most powered for MAF > 0.10) • Sequencing • New exome arrays • How do we analyze them?
Analysis of rare variants • One-at-a-time analysis • Multi-marker tests • Cohort Allelic Sums Test (CAST) • Combined multivariate and collapsing (CMC) • Various weighted sum statistics • More flexible methods...
One-at-a-time analysis • Standard univariate test we’ve been talking about • Univariate analysis will have low power unless a very large sample size MAF = (76 + 131) / [76 + 131 + 2*(9621 + 8109)] = 0.0058 Nejentsev et al., Science 2009
Standard Multi-marker tests • Evaluate multiple rare variants simultaneously in a single model • logit(P(Y=1|X))= +1x1+2x2+…+MxM • H0: =0 • Standard approach (likelihood ratio, score test) may have difficulty fitting the model due to sparse data • (Recap: one of the approaches we brought up last time to analyze groups of common variants also)
Cohort Allelic Sums Test (CAST) • Collapsing method: group rare variants, e.g., within a gene • Assumes same effect size of each variant in a group, logit(P(Y=1|X))= +{k=1,…,Mxk} • Like regressing count of number of minor alleles across multiple loci Cohen et al., Science 2004; Morgenthaler Mut Res 2007
Combined multivariate and Collapsing (CMC) • Combines the previous two approaches, but simultaneously models rare and common variants • Rare variants collapsed together per MAF, and treated as a single variant logit(P(Y=1|X))= +{k=common variants} kxk +rare{k=1,…,Mxk}
Other rare variant approaches • Many, many other rare variants methods out there • Different assumptions (or lack there of) on how rare variants effect disease, e.g., how smoothed together, prior knowledge,… • Little data yet to determine which methods work best, but coming -- exome chips, etc. • No consensus on best approach
Summary: Rare variants • Need to aggregate rare variants for increased efficiency • Difficult to choose aggregation a priori, more data-driven approaches may be more useful
Outline • Multiple testing • Gene-environment interaction • Gene-gene interaction • Rare variants • Pharmocogenetics, Phamacogenomics
What is Pharmacogenetics? The study of the role of inheritance in the individual variation in drug response. Efficacy Toxicity
Adverse Drug Reactions are common Phillips et al. JAMA 2001
Pharmacodynamics How a drug acts Drug target
Pharmacokinetics How a drug is processed ADME Absorption Distribution Metabolism Excretion Drug Levels (dosage) Efficacy Toxicity
Drug levels in the body Plasma concentration Metabolic Ratio Compare blood vs. urine Probe drug Can be measured over time