190 likes | 377 Views
Multiple Comparisons Measures of LD. Jess Paulus, ScD January 29, 2013. Today’s topics. Multiple comparisons Measures of Linkage disequilibrium D’ and r 2 r 2 and power. Multiple testing & significance thresholds. Concern about multiple testing
E N D
Multiple ComparisonsMeasures of LD Jess Paulus, ScD January 29, 2013
Today’s topics • Multiple comparisons • Measures of Linkage disequilibrium • D’ and r2 • r2 and power
Multiple testing & significance thresholds • Concern about multiple testing • Standard thresholds (p<0.05) will lead to a large number of “significant” results • Vast majority of which are false positives • Various approaches to handling this statistically
Possible Errors in Statistical Inference Unobserved Truth in the Population Ha: SNP prevents DM H0: No association True positive (1 – β) False positive Type I error (α) Reject H0: SNP prevents DM Observed in the Sample True negative (1- α) False negative Type II error (β): Fail to reject H0: No assoc.
Probability of Errors α = Also known as: “Level of significance” Probability of Type I error – rejecting null hypothesis when it is in fact true (false positive), typically 5% p value =The probability of obtaining a result as extreme or more extreme than you found in your study by chance alone
Type I Error (α) in Genetic and Molecular Research A genome-wide association scan of 500,000 SNPs will yield: 25,000 false positives by chance alone using α = 0.05 5,000 false positives by chance alone using α = 0.01 500 false positives by chance alone using α = 0.001
Multiple Comparisons Problem • Multiple comparisons (or "multiple testing") problem occurs when one considers a set, or family, of statistical inferences simultaneously • Type I errors are more likely to occur • Several statistical techniques have been developed to attempt to adjust for multiple comparisons • Bonferroni adjustment
Adjusting alpha • Standard Bonferroni correction • Test each SNP at the α* =α /m1 level • Where m1 = number of markers tested • Assuming m1 = 500,000, a Bonferroni-corrected threshold of α*= 0.05/500,000 = 1x10–7 • Conservative when the tests are correlated • Permutation or simulation procedures may increase power by accounting for test correlation
Measures of LD Jess Paulus, ScD January 29, 2013
Haplotype definition • Haplotype: an ordered sequence of alleles at a subset of loci along a chromosome • Moving from examining single genetic markers to sets of markers
Measures of linkage disequilibrium A G A G a g a g • Basic data: table of haplotype frequencies a g A g A G A G A G A G a g A g a g a g A G A G
D’ and r2 are most common • Both measure correlation between two loci • D prime … • Ranges from 0 [no LD] to 1 [complete LD] • R squared… • also ranges from 0 to 1 • is correlation between alleles on the same chromosome
D • Deviation of the observed frequency of a haplotype from the expected is a quantity called the linkage disequilibrium (D) • If two alleles are in LD, it means D ≠ 0 • If D=1, there is complete dependency between loci • Linkage equilibrium means D=0
A G A G a g a g a g A g A G A G A G A G a g A g a g a g A G A G D’ = R2 = D’ = r2= (86 – 0x2)2/ (10688) = .6 (86 – 0x2) / (86) =1
r2 and power • r2 is directly related to study power • A low r2 corresponds to a large sample size that is required to detect the LD between the markers • r2*N is the “effective sample size” • If a marker M and causal gene G are in LD, then a study with N cases and controls which measures M (but not G) will have the same power to detect an association as a study with r2*N cases and controls that directly measured G
r2 and power • Example: • N = 1000 (500 cases and 500 controls) • r2 = 0.4 • If you had genotyped the causal gene directly, would only need a total N=400 (200 cases and 200 controls)
Today’s topics • Multiple comparisons • Measures of Linkage disequilibrium • D’ and r2 • r2 and power