350 likes | 362 Views
Learn about statistical power, significance testing, effect size, and meta-analysis in genetic studies. Understand the importance of sample size and avoid false positive results.
E N D
Statistical Power and Meta-analysis Pak Sham International Workshop on Statistical Genetic Methods for Human Complex Traits March 8, 2017
Significance testing evaluates evidence for association • Question: Does the genotype at a particular locus have an effect on the phenotype? • Calculate a test statistic (e.g. chi-squared) from sample data on phenotype and genotype, then convert the statistic (e.g. by referring to a chi-squared distribution) to a p-value. • If the p-value is below a certain critical value (e.g. 0.05 when testing a single hypothesis, 5x10-8 in GWAS), then a “significant” result is reported, meaning that there is evidence for an effect
How is effect size defined? • The effect size of a binary (0,1) factor on a trait can be defined as • For quantitative trait - mean trait difference between the 2 groups • For dichotomous trait (e.g. disease status) - log(odds ratio) between the 2 groups • Odds ratio can be estimated from both cohort and case-control data (whereas risk ratio can be estimated only from cohort data) • A log(odds ratio) of 0 represents no effect
Complication: dominance • Definition of effect size is complicated for diploid locus because of the possibility of dominance interaction • For example, the effect of the allele inherited from the father may depend on what allele is inherited from the mother. • Additive effect is defined as the average effect of an allele, averaged over the possible values of the other allele in the genotype Effect = (mAA-maA)P(A)+(mAa-maa)P(a) • Basic association analysis tests for additive effects • Further association analyses may test for dominance and epistatic interactions
Two types of errors in significance testing True State Test Outcome Not significant Significant (H0 rejected) H0 Correct Type 1 error Type 2 error H1 Correct
Type 2 error probability= Statistical power Statistical power Probability of not rejecting H0 given that H1 is true But: How to define H1? H1 can range from a tiny to a huge difference from H0 (Effect size can range from very close to 0 to very far from 0) The bigger the effect size, the higher the statistical power
Other determinants of statistical power Type 1 error rate • The more stringent (smaller) we set the critical p-value for rejecting H0, the lower the statistical power Sample size • The larger the sample, the higher the statistical power
Importance of adequate statistical power • To not miss a real effect • To reduce the problem of non-replication of significant findings • Two main reasons for non-replication • Under-powered replication study (Type 2 error) • Original result being false positive (Type 1 error) • Inadequate statistical power contributes to high false positive report rate (proportion of significant results that are false positives)
Both type 1 and type 2 error rates affect the false positive report rate n Tests H0 H1 n(1-𝝿0) n𝝿0 NS S S NS n(1-𝝿0)(1-𝝱) n𝝿0(1-α) n𝝿0α n(1-𝝿0)𝝱 A B C D What is the false positive report rate? B/(B+D)
Reasons for high false positive report rate • Low prior probability of association • Appropriate prioritization of variants according to functional annotation may increase prior probability of association • Inadequate control of type 1 error rate • Type 1 error rate should be sufficiently stringent to take account of multiple testing • Inadequate statistical power • Sample size should be large enough for complex disorders where genetic effects are likely to be small
Critical assumption:effect size • Problem: usually we do not know what the true effect size is, if an effect is present. • Can we make the statistical power higher simply by setting a larger effect size? • Unfortunately, setting a larger effect size in a power calculation doesn’t make the true effect size any larger. Critical question: What is a realistic effect size?
How to set the effect size • Replication study • Effect size of original study (with downward adjustment for winner’s curse if original study involved multiple testing) • Original study • Typical effect sizes found by previous studies of similar phenotypes and similar genetic variants • Often desirable to consider a range of plausible effect sizes and present results in tables or graphs
Illustrative sample size plot OR=1.2 1.3 1.5 2.0 Wang et al, (2005)
What’s the winner’s curse • Suppose 100 independent SNPs on a SNP chip have identical allele frequency and effect size, such that each has 1% power to reach critical genome-wide significance in a particular study • The probability that at least one SNP achieves genome-wide significance is 1-(0.99)100 ≈ 0.63. • The estimated effect size of the most significant SNP is expected to be much greater than its true effect size (i.e. biased upwards) • A replication study with identical design and sample size has only a 1% chance of replicating this SNP at the same genome-wide level of significance. • Power calculation based on the effect size estimate of the original study will be grossly over-optimistic
Statistical power is related to effect size estimation • Statistical power is the probability that the test statistic exceeds a critical value. • The (chi-squared) test statistic is approximately • (Estimated effect size)2/(Variance of estimate) • The expected values of the test statistic under H1 and under H0 differ by • (True effect size)2/(Variance of estimate) • This quantity is known as the non-centrality parameter (NCP) • 1/(Variance of estimate) is known as Fisher’s statistical “information”
Power calculation via NCP Sample size N Effect size e NCP Power Allele frequency p α The NCP is linearly related to N, e2 and p(1-p) NCP is often a convenient intermediate step in calculating power Monotonic but non-linear relationship between NCP and Power
Power and NCP (df=1) • = 0.01, 0.001, 0.0001, 0.00001, 0.000001, 0.0000001 Power NCP NCP
Example: Quantitative phenotype Linear regression model: Y = α + βX + ε X is genotype, coded as 0, 1, 2 H0: β=0, usually t-test or F-test In large samples, t ≈ Normal, F ≈ Chi-squared
Power loss from indirect association • NCP also simplifies power calculation when the test SNP does not have direct effect on the phenotype but is in LD with one that does (indirect association) • If the LD between the test SNP and the causal SNP has magnitude r2, then the NCP at the test SNP is equal to the NCP at the causal SNP attenuated by a factor of r2 • In other words, the sample size to achieve equivalent power is increased by a factor of 1/r2 Sham et al, Am J Hum Genet 2000
Power gain from extreme phenotypic selection Under a polygenic model, selecting individuals with extreme (very low or very high) phenotypic values for genotyping can improve study efficiency NCPS / NCPP = VarS / VarP
A simple genetic power calculation tool Genetic power calculator http://pngu.mgh.harvard.edu/~purcell/gpc/ Purcell, Cherny and Sham, Bioinformatics, 2003
What is meta-analysis? • Literally means analysis of analyses, sometimes known as “quantitative literature review” • Multiple studies have tried to answer a question, but none is large enough to provide a definitive answer • The studies may collectively contain enough information to provide a definitive answer, if only their data can be combined. • However, it may be very laborious (or impossible) to obtain, combine and analyze the raw data from all the studies • Fortunately, most of the relevant information of the studies are captured in the usually reported summary statistics e.g. test statistic, p-value, effect size estimate and its standard error • Meta-analysis combines such summary statistics to address the research question
Need for meta-analysis in complex disease genetics • Complex disorders are polygenic with many variants each contributing a small effect • Most individual studies are under-powered • Meta-analysis of summary statistics from individual studies offers a way of enhancing power and producing robust (i.e. replicable) association results
Inverse-variance method The most common method of meta-analysis is to combine the effect size estimates (e.g. lnOR), weighting each estimate by the inverse of its variance. The overall estimate, b, is given by b = (b1/v1 + b2/v2 + b3/v3 + ….) / (1/v1 + 1/v2 + 1/v3 + ….) The variance of this overall estimate is given by v = 1 / (1/v1 + 1/v2 + 1/v3 + ….) An overall chi-square test statistic is given by b2 / v
Weighted β where
Meta-analysis based on p-values • Unfortunately some studies may report p-values and but effect size estimates • One possible solution is to calculate estimated effect sizes (and standard errors) from p-values (this requires knowledge of allele frequencies) and then proceed with the inverse variance method • Another solution is to perform meta-analysis based on the p-values directly
Known direction of effect • If, in addition to the p-value, the direction of each effect is available (i.e. whether the variant allele increases or decrease risk, relative to the reference allele), then an approach based on combination of signed normal test statistics is feasible • First convert each p-value to a chi-square statistic, using the inverse chi-square distribution function • Then take the positive square root of each chi-square statistic (Z), and flip to the sign to negative if the variant allele decreases risk • An overall normal test statistic is obtained by combining the signed test statistics, weighting each statistic by the square root of its sample size, and dividing by the overall sample size
Weighted Z where
Unknown direction of effect Fisher’s method: Sum of χ2’s Correlation of p-values from the two methods ~ 0.99 Chi-squared statistics can be weighted by sample size Some expected power loss compared to inverse-variance or weighted Z method
Why not random effects? • The above methods represent “fixed effect” (FE) meta-analysis, which assumes a single effect size β underlying all the data • Random effects (RE) meta-analysis allows β to differ in different studies / populations • Since variation in β is likely to exist, why is FE meta-analysis generally preferred in genetics? • H0: β = 0 for all populations • H0: E(β) = 0 across populations • The first H0 is more appropriate, but the RE model is designed to test the second H0 Han & Eskin, 2011, AJHG
Some practical issues • Most of the work in a meta-analysis is the collection and preparation of the summary statistics in a form that can be combined • For the summary statistics to be comparable, it is necessary, as far as possible, to ensure: • Uniform phenotype definition • Common set of SNPs (by imputation if necessary) • Consistent calling of the alleles (no flipping) • Same coding scheme for genotypes (e.g. 0, 1, 2) • Uniform analysis method, e.g. logistic regression allowing for principal components
Unambiguous alleles ATCTGGT[A/C]CTCCAT TAGACCA[T/G]GAGGTA • A is equivalent to T • C is equivalent to G • No ambiguity across datasets, even if some studies label the two alleles as A/C and other studies label them as T/G
Ambiguous alleles • An annoying problem: ATCTGGT[A/T]CTCCAT TAGACCA[T/A]GAGGTA • Allele A in one study may be labeled as T in another • G/C SNPs have the same problem
Resolving ambiguous alleles • Two ambiguous SNP types • A/T and G/C • Flip alleles if probe sequence is complementary to reference sequence: http://www.well.ox.ac.uk/~wrayner/strand/ • Allele flipping is suspected if there is heterogeneity in • allele frequency – e.g. the “same” allele having frequency 0.1 in some datasets but 0.9 in others) • linkage disequilibrium - the “same” allele being in positive LD with a nearby allele in some datasets, but negative LD with the same nearby allele in others • effect size – the “same” allele having a positive effect in some datasets but a negative effect in others.