280 likes | 393 Views
z-test and t-test. Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca. Properties of a Normal Distribution. 68.27% of the measurements lie within the range of , 95.44% lie within 2, 99.73% lie within 3, 50% lie within 0.67, 95% lie within 1.96,
E N D
z-test and t-test Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca
Properties of a Normal Distribution 68.27% of the measurements lie within the range of , 95.44% lie within 2, 99.73% lie within 3, 50% lie within 0.67, 95% lie within 1.96, 97.5% lie within 2.24, 99% lie within 2.58, 99.5% lie within 2.81, 99.9% lie within 3.29. Given = 70kg and = 10kg for a normal distribution (of body weight), what is the probability of a body weight of 40 kg belonging to the population? The normal deviate: Standard deviation and Standard Error of the mean: The standard deviate pertaining to the normal distribution of means:
The z-score The government has certain regulations on commercial product. Suppose that packages of sugar labeled as 2 kg should have a mean weight of 2 kg and a standard deviation equal to 0.10. If a package of sugar labeled 2 kg that you bought from a store has a weight of 1.82 kg, what is the z score? Can you present the package as evidence that the manufacturer has violated the government regulation?
Normal Distribution Body Weight of 10,000 Adult Men Mean = 70 kg, Std Dev = 10 kg 350 300 250 200 Frequency 150 100 50 0 29.91 36.32 42.74 49.15 55.57 61.98 68.40 74.81 81.23 87.64 94.06 100.47 106.89 Body Weight
Frequency Distribution of Means 350 300 250 200 Frequency 150 100 50 0 29.91 36.32 42.74 49.15 55.57 61.98 68.40 74.81 81.23 87.64 94.06 100.47 106.89 Body Weight
Darwin’s Breeding Experiment Wrong method assuming normal distribution: = 20.933; = 37.744; n = 15; Therefore, the mean difference is significantly larger than zero, i.e., inbreeding does reduce seed production. Is the mean difference significantly larger than 0?
Problem of Small Samples I may premise that if we took by chance a dozen or score of men belonging to two nations and measured them, it would I presume be very rash to form any judgment from such small numbers on their (the nation’s) average heights. But the case is somewhat different with my … plants, as they were exactly of the same age, were subjected from first to last to the same conditions, and were descended from the same parents. -- Darwin, quoted in Fisher’s The design of experiments.
William S. Gosset & t Distribution t distribution is wider and flatter than the normal distribution Normal distribution t distribution 350 300 250 200 Frequency 150 100 50 0 29.91 36.32 42.74 49.15 55.57 61.98 68.40 74.81 81.23 87.64 94.06 100.47 106.89 Body Weight
t distribution • The t distribution depends on the degree of freedom (DF). For Darwin’s data with a sample size = 15, DF = 15 - 1 = 14. • With the t distribution with DF = 14, we expect 95% of the observations should fall within the range of mean 2.145 STD. • Remember that for a normal distribution, 95% of the observations are expected to fall within the range of 1.96 . • For pair-sample t-test with the null hypothesis being Mean1 = Mean2 (or MeanD = 0):
T-Test • T-Test can be used to test • the difference in mean between two samples (paired or unpaired), • a sample mean against a mean of a known population (e.g., the concentration of a medicine set as a standard by the government), • whether a single individual observation belong to a sample with sample size larger than one. • The normal distribution and the Student’s t distribution. Why should the statistic t take into consideration both the mean difference and the variance? • How to apply the test using Excel or SAS. • The assumptions. • Alternative methods: Wilcoxon rank-sum test or Mann-Whitney U test.
The Essence of the t Statistic -18 -6 -4 -12 -6 -2 0 0 2 6 12 4 6 18 Same variance, smaller mean difference 6 -6 -4 -2 0 2 4 Same mean difference, larger variance
More on variance and SE Two independent variables: x1, x2 sampled from two normal distributions
Computation for unpaired t-test Df = (7-1) + (7-1) = 12
Paired-sample t-test: 3 Using blocks to reduce confounding environmental factors (Everything else being equal except for the treatment effect) in evaluating the protein content of two wheat variaties. Block 1 Block 1 1 1 1 1 1 2 1 2 Block 2 Block 2 1 2 1 2 1 1 1 1 Block 3 Block 3 2 1 2 2 2 2 2 1 Block 4 Block 4 2 2 2 1 2 1 2 2 How should we allocate the two crop varieties to the plots? What comparison would be fair?
The Wilcoxon-Mann-Whitney Test • Statistical significance tests can be grouped into • Parametric tests, e.g., t-test, ANOVA • Non-parametric tests, e.g., Wilcoxon-Mann-Whitney test, sign test, runs test.
When to Use Non-parametric Tests • Parametric tests depends on the assumed probability distributions, e.g., normal distribution, t distribution, etc, and would give misleading results when the assumptions are violated. • Non-parametric tests are called distribution-free tests and can be used in cases where the parametric tests are inappropriate. • Parametric tests are more powerful than their non-parametric counterparts when the underlying assumptions are met.
Wilcoxon-Mann-Whitney Test • The Wilcoxon-Mann-Whitney test is the non-parametric equivalent of the t-test. • The original data are rank-transformed before applying the test • The test statistic is U