Beyond GWAS

Beyond GWAS Thomas Hoffmann

Recap: Where's the heritability? Visccher, AJHG 2011

Outline • Multiple testing • Gene-environment interaction • Gene-gene interaction • Rare variants • Pharmacogenetics, Phamacogenomics

Multiple testing • Recall we are testing ~1 Million markers, more or less • Several strategies to adjust the p-values for doing so many tests • Bonferroni • False Discovery Rate (FDR) • Permutation

Multiple testing - Bonferroni • Bonferroni adjustment • 0.05/{# tests, i.e., # markers, M} • most widely used in practice

Multiple testing - FDR • False Discovery Rate (FDR) limits the expected number of false positives • Use some procedure to solve for parameters in the table (e.g., simes)

Multiple testing - Permutation • Many of the tested genotype markers are correlated with each other (in LD), and so the tests are correlated • Bonferroni adjusts as if they were completely independent • Permutation will be more powerful, but… • [max(T) in plink, --mperm]

Multiple testing - Permutation • Suppose we compute the 2 statistic (recall the 2x3 table of case/control, genotype) • We can order these from smallest to largest, e.g., 2(1) 2(2)  ...2(M) • Permute case control labels, recompute the order statistics: 2(1)* 2(2)* ...2(M)* • Repeat T times • 2(k)permutation={# permuted 2(k)* 2(k)}/T

Multiple testing - Permutation (continued) • … So, feasible, but more computation (1000x) • Regression similar idea, but… • Not quite as simple as permuting case-control labels, when there are covariates • More like permuting residuals

Summary: Multiple testing • Most people just use Bonferroni correction • Other methods more powerful • Laird comments (text for the course) • “Given the many false positive findings in the history of genetic association studies, one rather errs on being too conservative.”

Outline • Multiple testing • Gene-environment interaction • Gene-gene interaction • Rare variants • Pharmocogenetics, Phamacogenomics

Mental retardation and seizures 1/15,000 live births 1/100,000 in Finland 1/2,600 in Turkey PKU: Phenylketonuria

Phenylalanine Metabolism

PKU: Phenylketonuria Causes Mutations in Phenylalanine Hydroxylase (PAH) Dietary Phenylalanine Both are necessary Neither is sufficient

PKU: Gene-Environment Interaction

Gene-Environment Interaction Strata Cases Controls Gene (G+) Environment (E+) a b Gene (G+) No Environment (E-) c d No Gene (G-) Environment (E+) e f No Gene (G-) No Environment (E-) g h

Gene-Environment Interaction Strata Cases Controls G+E+ a b G+E- c d G-E+ e f G-E- g h Odds Ratio (OR) ah / bg ch / dg eh / fg 1

Multiplicative Effects G+E+ 45 15 G+E- 15 5 G-E+ 9 3 G-E- 3 1 Absolute Odds Strata Risk Ratio x3 x5 x5 x3 x3 x5 P(Diseased|G,E)=P(G+)P(E+)p00

Multiplicative Effects OR Interaction = ORG+E+ / ORG+E- ORG-E+ If OR Interaction = 1, multiplicative effects Example: OR Interaction = 15 / 5 x 3 OR Interaction = 1

Factor V Leiden Mutations, Oral Contraceptive Use, and Venous Thrombosis Strata Cases Controls G+E+ 25 2 G+E- 10 4 G-E+ 84 63 G-E- 36 100 OR 34.7 6.9 3.7 Reference Total 155 169 Vanderbroucke et al., The Lancet 1994

Evidence for G-E Interaction Strata OR G+E+ 34.7 G+E- 6.9 G-E+ 3.7 G-E- Ref OR Interaction = 34.7 / 6.9 x 3.7 = 1.4

Testing for GxE in regression • P(Y=1|g,E)=0+ gX(g)+ eE+ geX(g)E • E could also be continuous... • Tricky! - Scale dependent • What if we modeled E differently, i.e. log(E) or added in E2, etc.? • Can model X(g)=(Ig=AA, Ig=AB) • Tricky! Statistical interaction  biological interaction

Joint test of G, GxE • P(Y=1|g,E)=0+ gX(g)+ eE+ geX(g)E • Joint test, H0: 1=0, 3=0 • Don’t have to worry about model misspecification

Kraft, Human Heredity 2007 If ORge = 1, then G is most powerful. Case-only, don’t worry about g ge g, ge

Gene-gene interaction • Similar to gene-environment interaction, in terms of scale, etc. • epistasis: gene-gene interaction • compositional epistasis: effect of one genetic factor is masked unless another one is also present

Gene-gene interaction • P(Y=1|g1,g2)=0 + 1X(g1) + 2X(g2) + 12X(g1) X(g2) • Usually test when g1 is from one gene, and g2 from another gene • plink: --fast-epistasis • feasible: “4.5 billion two-locus tests generated from a 100K data set took just over 24 hours to run” (http://pngu.mgh.harvard.edu/~purcell/plink/)

Gene-Gene Interaction Models Marchini et al. Nature Genetics 2005

GWAS of Psoriasis Strange et al. Nature Genetics 2010

Gene-Gene Interaction Strange et al. Nature Genetics 2010

Minor Allele Frequency • Common: MAF > 0.05 • Less common: 0.05>MAF>0.01 • Rare: 0.01<MAF • SNP: MAF>0.01 (Single Nucleotide Polymorphism) • SNV: MAF<0.01 (Single Nucleotide Variant)

Rare variants • Previous GWAS focused on chips designed for MAF > 0.05 (most powered for MAF > 0.10) • Sequencing • New exome arrays • How do we analyze them?

Analysis of rare variants • One-at-a-time analysis • Multi-marker tests • Cohort Allelic Sums Test (CAST) • Combined multivariate and collapsing (CMC) • Various weighted sum statistics • More flexible methods...

One-at-a-time analysis • Standard univariate test we’ve been talking about • Univariate analysis will have low power unless a very large sample size MAF = (76 + 131) / [76 + 131 + 2*(9621 + 8109)] = 0.0058 Nejentsev et al., Science 2009

Standard Multi-marker tests • Evaluate multiple rare variants simultaneously in a single model • logit(P(Y=1|X))= +1x1+2x2+…+MxM • H0: =0 • Standard approach (likelihood ratio, score test) may have difficulty fitting the model due to sparse data • (Recap: one of the approaches we brought up last time to analyze groups of common variants also)

Cohort Allelic Sums Test (CAST) • Collapsing method: group rare variants, e.g., within a gene • Assumes same effect size of each variant in a group, logit(P(Y=1|X))= +{k=1,…,Mxk} • Like regressing count of number of minor alleles across multiple loci Cohen et al., Science 2004; Morgenthaler Mut Res 2007

Combined multivariate and Collapsing (CMC) • Combines the previous two approaches, but simultaneously models rare and common variants • Rare variants collapsed together per MAF, and treated as a single variant logit(P(Y=1|X))= +{k=common variants}  kxk +rare{k=1,…,Mxk}

Other rare variant approaches • Many, many other rare variants methods out there • Different assumptions (or lack there of) on how rare variants effect disease, e.g., how smoothed together, prior knowledge,… • Little data yet to determine which methods work best, but coming -- exome chips, etc. • No consensus on best approach

Summary: Rare variants • Need to aggregate rare variants for increased efficiency • Difficult to choose aggregation a priori, more data-driven approaches may be more useful

What is Pharmacogenetics? The study of the role of inheritance in the individual variation in drug response. Efficacy Toxicity

Adverse Drug Reactions are common Phillips et al. JAMA 2001

Pharmacodynamics How a drug acts Drug target

Pharmacokinetics How a drug is processed ADME Absorption Distribution Metabolism Excretion Drug Levels (dosage) Efficacy Toxicity

Drug levels in the body Plasma concentration Metabolic Ratio Compare blood vs. urine Probe drug Can be measured over time

Standard TPMT Dosing

Drug Exposure and Toxicity

Genotype Specific TPMT Dosing

Drug Exposure and Toxicity

Beyond GWAS

Beyond GWAS

Presentation Transcript

Missing Heritability & GWAS

Class GWAS

Real data and GWAS Case Study

Study Designs in GWAS

GWAS Data Status: ATTREX 2013

GWAS and R

Genome-Wide Association Study (GWAS)

Practical aspects of GWAS

GWAS for Extract

What we have learned from GWAS

Genome Variations & GWAS

GWAS – the future

Statistical Genetics 6 GWAS Data QC

GWAS vs NGS

Genome-wide association studies (GWAS)

On genome-wide association studies (GWAS)

Understanding GWAS SNPs

GWAS Analysis Pipeline

Genome-wide association studies (GWAS)

Genome-wide association studies (GWAS)

What we have learned from GWAS

Beyond GWAS

Beyond GWAS

Presentation Transcript

Missing Heritability &amp; GWAS

Class GWAS

Real data and GWAS Case Study

Study Designs in GWAS

GWAS Data Status: ATTREX 2013

GWAS and R

Genome-Wide Association Study (GWAS)

Practical aspects of GWAS

GWAS for Extract

What we have learned from GWAS

Genome Variations &amp; GWAS

GWAS – the future

Statistical Genetics 6 GWAS Data QC

GWAS vs NGS

Genome-wide association studies (GWAS)

On genome-wide association studies (GWAS)

Understanding GWAS SNPs

GWAS Analysis Pipeline

Genome-wide association studies (GWAS)

Genome-wide association studies (GWAS)

What we have learned from GWAS

Missing Heritability & GWAS

Genome Variations & GWAS