520 likes | 937 Views
Topic #7 Single-Locus Association Studies: Case-Control Studies. University of Wisconsin Genetic Analysis Workshop June 2011. Outline. Case-Control Study: Two-allele, single locus model Alternative Tests for Association Quantitative Outcomes: Two-allele, single locus model
E N D
Topic #7Single-Locus Association Studies:Case-Control Studies University of Wisconsin Genetic Analysis Workshop June 2011
Outline • Case-Control Study: • Two-allele, single locus model • Alternative Tests for Association • Quantitative Outcomes: • Two-allele, single locus model • Alternative Tests • Multiple Testing (Topic #8):
Tests (Models) of Association • Genotype: Distribution of 3 genotypes differs in the two groups (unstructured alternative) Standard c2 on 2df • Recessive: Relative frequency of A1A1 differs in two groups c2 on 1df • Dominant: Relative frequency of A2A2 differs in two groups c2 on 1df
Case-Control Example: Genotype Test Test #1: Compare genotype frequency in cases and controls Test: c2(2df) = 27.7; p < 10-5
Case-Control Example: Recessive Test Test #2: Rate of E4/E4 in cases and controls Test: c2 (1df) = 5.46; p =.019
Case-Control Example: Dominant Test Test #3: Compares rate of +/+ in Cases and Controls Test: c2(1df) = 27.3; p < 10-6
Simple Association: More Tests • Trend (Cochran-Armitage): Regress proportion of cases on # of risk alleles (here E4) • Allele Test: Count alleles rather than individuals (assumes HWE) • Case-only design: Test whether cases are in HWE • Logistic Model
Trend Test (Cochran & Armitage) Test #4: Cochran-Armitage Trend Test Test: c2(1df) = 25.3; p < 10-5
Allele Test Test #5: Allele Frequency Comparison Test: c2(1df) = 26.8; p < 10-6 Sasieni, P. D. (1997). From genotypes to genes: Doubling the sample size. Biometrics, 53(4), 1253-1261.
Case-Only Design: Test for HWE Test #6: Departure from HWE Test: c2(1df) = 0.4; p = .52 (Little power for multiplicative model)
And there are more: A 7th (and later 8th!) test • Log-additive/logistic:
Advantage of Logistic Framework • Can easily accommodate covariates • Can accommodate alternative models (e.g., dominance or recessive models) with dummy variables • Test of H0: bi = 0 is very nearly same as allele test
Genetic Determinants of Human Ageing and Longevity Project • Aim: • Identify genetic variants associated with extreme longevity • Basic Design: • 1200 cases (1905) and 800 controls (MADT) • Candidate-gene approach: 168 genes • Genotyping: • 1536 SNPs using Illumina’s Golden Gate Array
Summary of plink Data Cleaning of GCLC_Clean • Start: 1200 Cases, 800 Controls, 13 SNPs • Eliminate: • 293 (103 cases/90 controls) individuals with > 10% missing • 1 SNP eliminated because > 10% missing • 1 SNP fail HWE at p < .001 • 1 SNP eliminated due to low MAF • Final sample: 997 cases, 710 controls and 13 SNPs in GCLC
plink Implementation of Association Tests • Basic association test (allelic): plink --file gclc_clean --assoc (generates plink.assoc)
plink Association Output (plink.assoc) CHR SNP BP A1 F_A F_U A2 CHISQ P OR 6 rs7742367 53469235 G 0.169 0.1472 A 2.906 0.08826 1.178 6 rs670548 53474948 G 0.3507 0.39 A 5.504 0.01898 0.8447 6 rs661603 53478066 G 0.4626 0.4085 A 9.752 0.001791 1.246 6 rs16883912 53481730 A 0.1093 0.09437 G 1.988 0.1585 1.177 6 rs572496 53485578 A 0.5005 0.4458 G 9.952 0.001607 1.246 6 rs617066 53491877 A 0.3296 0.2648 G 16.52 4.82e-005 1.365 6 rs2100375 53493434 A 0.3539 0.3077 G 7.936 0.004846 1.232 6 rs531557 53497954 T 0.4769 0.433 A 6.421 0.01128 1.194 6 rs16883966 53505685 G 0.05308 0.0346 A 6.506 0.01075 1.564 6 rs4712035 53509062 C 0.1745 0.1711 G 0.06685 0.796 1.024 6 rs2397147 53509546 G 0.432 0.388 A 6.608 0.01015 1.2 6 rs534957 53514310 G 0.3258 0.338 C 0.5596 0.4544 0.9464 6 rs675908 53521259 G 0.3246 0.3383 A 0.7009 0.4025 0.94 Highlighted nominally significant at p < .05
plink Implementation of Association Tests • Basic association test (allelic): plink --file gclc_clean --assoc (generates plink.assoc) • Genetic model based tests (genotype, trend, domin, recess): plink --file gclc_clean --model (generates plink.model)
Association ‘Model’ Tests for 13 GCLC SNPs Highlighted In Red, nominally significant at p < .05, In Blue, significant after Bonferroni correction p < .004 (i.e., 05/13)
Low Frequency SNPs • Within the 13 GCLC SNPs, rs16883966had MAF < .05 (.049 in Danish 1905 and .037 in MADT) • For this SNP unable to compute test statistic for Genotype, Dominant, & Recessive models because of low cell frequencies (Exp < .05)
plink Implementation of Association Tests • Basic association test (allelic): plink --file gclc_clean --assoc (generates plink.assoc) • Genetic model based tests (genotype, trend, domin, recess): plink --file gclc_clean --model (generates plink.model) • Fisher exact test (the 8th!): plink --file gclc_clean --fisher (generates plink.fisher) • Logistic: plink --file gclc_clean --logistic (generates plink.logistic)
Genotypic Values A2A2 A1A1 A1A2 u11 u12 u22
Genotypic Values A2A2 A1A1 A1A2 u11 u12 u22 -a d a
Genotypic Values A2A2 A1A1 A1A2 u11 u12 u22 -a d a d is dominance parameter; when d = 0, locus is additive
Additive Genetic Variance Note: d contributes to additive variance whenever q is not equal to .5
Dominance Genetic Variance Note: There is dominance variance only when d is not 0
Complete Additivity Slope of regression line =a Additive genetic variance = regression variance 1 0 2
Partial Dominance Slope of regression line = a Dominance = Residual Variance Additive genetic variance = regression variance 1 0 2
Complete Dominance Dominance = Residual Variance Slope of regression line = a Additive genetic variance = regression variance 1 0 2
Some Conclusions • Dominance effects contribute to additive genetic variance • Even with complete Mendelian dominance, additive variance typically exceeds dominance variance (exception would be overdominance)
Power Calculation in Quanto for Quantitative Trait • In a study of 1000 unrelated individuals, what is our power to detect a single locus effect? • Strength of genetic effect (R2g) • Risk allele frequency?
Quanto G Power Calculation • Outcome/Design: • Continuous Independent Individuals • Hypothesis: • Gene Only • Gene: • Allele Frequency .10 to .90 by .20 • Additive model • Outcome Model: • R2g = .001 to .019 by .002 • Power: • Sample Size = 1000 to 1000 by 0 • Type I error rate = .05, two-sided • Calculate:
Computed Power for N=1000(Minor Allele = Risk Allele) % Variance Accounted For
Association with a Quantitative Phenotype • Genotype: 10 SNP markers in the COMT gene, including rs4680 • Sample: 7235 participants in MCTFR longitudinal research • Phenotype: General externalizing composite (having an overall mean of ~ 0.0, SD ~ .36) plink --bfilecomt --phen ext.dat --mpheno 2 --missing-phenotype -99.0 --assoc –qt-means
Output: plink.qassoc CHR SNP BP NMISS BETA SE R2 T P 22 rs4646312 18328337 7233 -0.003598 0.006141 4.747e-005 -0.5859 0.558 22 rs165656 18328863 7232 -0.01252 0.005983 0.0006056 -2.093 0.03637 22 rs165722 18329013 7235 -0.01346 0.005974 0.0007017 -2.254 0.02424 22 rs2239393 18330428 7233 -0.003556 0.006125 4.662e-005 -0.5806 0.5615 22 500437 18330763 7232 -0.004062 0.006127 6.079e-005 -0.663 0.5074 22 rs4680 18331271 7234 -0.01358 0.005973 0.0007139 -2.273 0.02305 22 rs4646316 18332132 7235 -0.002434 0.007201 1.58e-005 -0.3381 0.7353 22 rs165774 18332561 7235 0.009351 0.006543 0.0002823 1.429 0.153 22 rs174699 18334458 7235 -0.0124 0.01288 0.0001281 -0.9626 0.3358 22 rs165599 18336781 7233 -0.004997 0.006435 8.337e-005 -0.7765 0.4375 Highlighted: Nominally significant at p < .05