Adapted from: Wulff HR, Andersen B, Brandenhoff P, Guttler F (1987): What do doctors know about statistics? Statistics i

Suppose we conduct a t-test of the difference between two means and obtain a p-value < .05. Does this mean: • There is less than a 5% chance that the results are due to chance. • If there really is no difference between the population means, there is less than a 5% chance of obtaining a difference this large or larger. • There is a 95% chance that if the study is repeated, the result will be replicated. • There is a 95% chance that there is a real difference between the two population means. Adapted from: Wulff HR, Andersen B, Brandenhoff P, Guttler F (1987): What do doctors know about statistics? Statistics in Medicine 6:3-10

What is a p-value? The probability of obtaining a test statistic (data) that departs as much as or more than the observed test statistic (data) if the null hypothesis were true.

Which Null Hypotheses are Meaningful and Testable? Those that precisely specify a probability model for the data.

A Perspective Samples Populations • We study: • We wish to obtain knowledge about: Data Nature

Gene Family-Based Hypothesis Testing • Sketch of Typical (outmoded and inappropriate) Approach: • For Genes 1 to K, define a vector, R, of length K that contains the values of a categorical variable denoting group membership. • For Genes 1 to K, define a vector, C, of length K that contains the values of a binary variable denoting whether or not the gene was ‘significant’ or ‘interesting’ by some standard. • Conduct some frequentist significance test for an association between R and C.

The Independence Issue: A Real Example

Gene Family-Based Hypothesis Testing • Which Null Hypothesis is Being Tested? • None of the genes in family c are differentially expressed (associated, methylated, etc.). • The proportion of genes in family c that are differentially expressed is equal to the proportion of genes in the remainder of the genome that are differentially expressed (beware of ‘anti-Bayesian’ element). • The proportion of genes in family c that are differentially expressed to an extent greater than  is equal to the proportion of genes in the remainder of the genome that are differentially expressed. • Note: These can all be subsumed under the general: • H0:

Union-Intersection The compound hypothesis is rejected if any one of the individual hypotheses are rejected Multiplicity adjustment procedure is required to control type I error rate The rejection region for this test is the union of rejection regions corresponding to the individual tests Intersection-Union The compound hypothesis is rejected only if all of the individual hypotheses are rejected Overall type I error rate of α is maintained without multiplicity adjustment The rejection region for this test is the intersection of the rejection regions corresponding to the individual tests Union-Intersection vs Intersection-Union Tests When P << N, methods are well established (e.g., multiple regression. When P >> N optimal methods are not yet clear. Methods not yet well established. Bayesian methods involving posterior probabilities in place of p-values may be especially useful.

What assumptions are being made? • Normality? • Exchangeability? • Independence? • Other? • Non-Parametric: Non-Panacea (Cohen, J.) • Asymptotic  Exact

Major Issues to Ask About in Selecting a Method for Gene Family or Pathway Testing • What is the null? • Does the method assume that all components (e.g., SNPs or gene expression levels) are independent? • Is the method ‘anti-Bayesian’? • Does the method use the continuity of information (not simply significant or not)?

Adapted from: Wulff HR, Andersen B, Brandenhoff P, Guttler F (1987): What do doctors know about statistics? Statistics i

Adapted from: Wulff HR, Andersen B, Brandenhoff P, Guttler F (1987): What do doctors know about statistics? Statistics i

Presentation Transcript

Chapter 6

Descriptive Statistics Univariate Statistics Chi Square ANOVA

Statistics

Drug use and non-use: statistics

Improving Migration and Population Statistics

5 th Annual Meeting of the Washington Group on Disability Statistics

Matrix Decomposition and its Application in Statistics

SPH6004 Advanced Biostatistics

Evidence Based Dentistry: Statistics 2

Chapter 1

Statistics

Statistics

Isaac Newton Institute - Cambridge

Chapter 3

Sri K.V. Subramanyam Joint Director Agriculture Statistics Division

Nuts and bolts of biostatistics

Statistics Review – Part I

Statistics

Statistics Workshop 2011