420 likes | 612 Views
Molecular & Genetic Epi 217 Association Studies: Direct. John Witte. Harry Potter’s Pedigree…. Muggle. Wizard / Witch. Vernon Dursley. Lily Evans. James Potter. Petunia Dursley. or. Harry Potter. Dudley Dursley. or. Squib. Argus Filch. What happened to Filch ?.
E N D
Molecular & Genetic Epi 217Association Studies: Direct John Witte
Harry Potter’s Pedigree… Muggle Wizard / Witch Vernon Dursley Lily Evans James Potter Petunia Dursley or Harry Potter Dudley Dursley or
Squib Argus Filch What happened to Filch ? ‘Best’ study design for detecting wizardry genes?
Association Studies • Use of association studies is rapidly expanding, reflecting a number of laudable properties, including their: • Ease, since one need not collect large pedigrees; and • Potential for being more powerful than conventional linkage-based approaches.
Linkage vs. Association Risch & Merikangas, Science 1996
Association Study Approaches • Candidate genes: • Functional • All common variants • All common variants in genome (GWAS) • All variants in genome (sequencing) • Expensive • Rare variants
Direct and Indirect Association Direct Association Indirect Association Ability to undertake indirect association depends on the LD / correlation among measured and unmeasured variants (i.e., ‘tagging’ and ‘coverage’).
Control Selection • A critical aspect of association studies is that controls should be selected from the cases’ source population. • That is, controls should be those individuals who, if they were diseased, would become cases.
Sub-population RpR Gene Disease Population Stratification • Confounding bias that may occur if one’s sample is comprised of sub-populations with different: • allele frequencies (); and • disease rates (RpR) • Cases are more likely than controls to arise from the sub-population with the higher baseline disease rate. • Cases and controls will have different allele frequencies regardless of whether the locus is causal.
Example of Population Stratification Cardon & Palmer, 2003
Family-Based Association Studies Siblings Parents G G G G G Cousins G G G
Transmission Disequilibrium Test (TDT) Transmitted alleles vs. non-transmitted alleles M1 M2 M2 M2 M1M2
TDT Transmitted alleles vs. non-transmitted alleles TDT = (n12 - n21)2 (n12 + n21) Asymptotically c2 with 1 degree of freedom
TDT For this one Trio: TDT = (1 - 0)2 (1 + 0) p-value = 0.32 = 1
Comparison of Designs • Family-based designs can be less efficient than population-based designs. Rare Recessive Common Rare Dominant High Risk Low Risk High Risk Population-based 100% 100% 100% Case-sibling 69% 51% 50% Case-cousin 97% 88% 88% TDT 231% 102% 101% Witte et al. Am J Epidemiol 1999 • Further, family-based designs can be require more recruitment efforts. • How about extending the designs to include unrelateds?
Genomic Control • Use population-based design, but incorporate into analysis genomic information to adjust for population stratification. • Genomic control: adjust test statistics for outliers due to population stratification. • Use unlinked genetic markers.
Genomic Control • For the gene(s) of interest, alter the test statistic(s) from case-control comparison: 2new = 2/ where = mean(21,…, 2k) or = median(21,…, 2k)/0.456 1,…k index the 2 tests for the unlinked markers. (Devlin & Roeder, 1999; Reich & Goldstein, 2000) • That is, one decreases the test statistic by a factor () that reflects stratification in the population. • Nowadays, commonly adjust for principal components reflecting genetic variability.
Principal Compoenents:Genetic Matching of Controls Luca et al. AJHG 2008
Subpopulation Gene Disease Continuum of Assoc Study Designs Population-based “Ethnicity” Matched Structured Assoc Family-based Population Stratification Overmatching (Bias…………………versus………………...efficiency) • Sharing of genes & envt. • Efficiency Also, recruitment issues
Candidate Gene Studies • Selection of candidates Linkage regions? Biological support?“I am interested in a candidate gene and have samples ready to study. What SNPs do I genotype?”
Candidate Gene: Where do I Start? • Location: What chromosome? What position on the chr? • Exons/UTR: How many exons? UTR regions? • Size: How large is the gene? Use UCSC genome browser.
SNP Picking: Things to Consider • Validation: What is the quality of the SNPs? • Informativity: Are these SNPs informative in my population? How common are they? Location? • Potentially Functional: Do these SNPs have a potential biological impact? Missense variants? • Previously Associated: Have previous studies found SNPs in the candidate gene associated with the outcome?
MTHFR Summary • Chromosome 1: 11,780,053-11,800,381 • Size: 20,329 bp • Exons: 12 • Potentially Functional: 5 missense of which 3 MAF >5% • Previously Associated: 3 (C677T, A1298C, A2756G)
102 SNPs across MTHFR Too Many SNPs to Genotype! MTHFR SNPs http://genome.ucsc.edu/cgi-bin/hgGateway
Analysis Simple chi-square test comparing genotype frequencies (2 d.f.) Called a ‘model-free’ or co-dominant analysis
Genetic Model ORs depend on genetic model R = r = 1 not risk allele R > r = 1 recessive R = r > 1 dominant R = r2 > 1 log additive (Assuming positive association) Genotype OR GG 1 GT r TT R
Tests of association If genetic model known: • Collapse genotypes into 2x2 table, 1 d.f. test • Trend test for log additive • (Use logistic regression) • Rarely know genetic model • Use all three models (dom, rec, log additive) • Compare fit with the co-dominant (2d.f.) model (LR test) • Cannot use LR test to compare models with each other as not nested • Model with best fit and smallest P is best?
Analysis of Rare Variants One-at-a-time analysis Multi-marker tests Cohort Allelic Sums Test (CAST) Combined Multivariate & Collapsing (CMC) Weighted Sum Statistic
1. One-at-a-time Analysis • Low power unless sample size is very large. Nejentsev…Todd. Science 2009;324:387.
2. Multi-marker tests Evaluate multiple rare variants simultaneously in a single model For example, May have difficulty fitting the model due to sparse data.
3. Cohort Allelic Sums Test (CAST) • Collapsing method: group rare variants (e.g., within a gene). • Assumes homogeneity of effects within groups. Cohen et al., Science 2004;305:869. Morgenthaler Mut Res 2007;615:28.
4.Combined multivariate & collapsing (CMC) • Combines 2 & 3, but simultaneously models rare & • common variants. • Rare variants collapsed together per MAF, and treated as a • single variant. Li & Leal, AJHG 2008;83:311.
5. Weighted Sum Statistic Reflects the number of rare variants among controls: Fewer observed -> more contribution to genetic score. Calculate a variance weighted ‘genetic score’ for the jth person sj = Sumi (Iij / wi) where Iij = # mutations in variant i in person j {0,1,2}. wi = stand dev of # variants (=1/sqrt(niqi(1-qi), q from controls). Rank individuals based on si Test if observed ranking among cases departs from expected under null hyp of no association. Good power for rare alleles Not for common causal alleles, though. Browning, AJHG 2009
Sidenote • Counsyl genetic testing