1 / 50

Understanding Simple Comparative Experiments in Statistical Inference

Learn the basics of Simple Comparative Experiments, hypothesis testing, and probability distributions for statistical inference. Get insights into Portland Cement Formulation and graphical data representations.

rnash
Download Presentation

Understanding Simple Comparative Experiments in Statistical Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2: Simple Comparative Experiments (SCE) • Simple comparative experiments: experiments that compare two conditions (treatments) • The hypothesis testing framework • The two-sample t-test • Checking assumptions, validity

  2. Portland Cement Formulation (page 23) • Average tension bond sterngths (ABS) differ by what seems nontrivial amount. • Not obvois that this difference is large enough imply that the two formulations really are diff. • Diff may be due to sampling fluctuation and the two formulations are really identical. • Possibly another two samples would give opposite results with strength of MM exceeding that of UM. • Hypothesis testing can be used to assist in comparing these formulations. • Hypothesis testing allows the comparison to be made on objective terms, with knowledge of risks associated with searching the wrong conclusion

  3. Graphical View of the DataDot Diagram, Fig. 2-1, pp. 24 • Response variable is a random variable • Random variable: • Discrete • continuous

  4. Box Plots, Fig. 2-3, pp. 26 • Displays min, max, lower and upper quartile, and the median • Histogram

  5. Probability Distributions • Probability structure of a Random variable, y, is described by its probability distribution. • y is discrete: p(y) is the probability function of y (F2-4a) • y is continuous: f(y) is the probability density function of y (F2-4b)

  6. Probability DistributionsProperties of probability distributions • y-discrete: • y-continuous:

  7. Probability Distributionsmean, variance, and expected values

  8. Probability DistributionsBasic Properties • E(c) = c • E(y) = m • E(cy) = c E(y) = c m • V(c) = 0 • V(y) = s2 • V(cy) = c2 V(y) = c2s2

  9. Probability DistributionsBasic Properties • E(y1+y2) = E(y1)+E(y2) = m1+m2 • Cov(y1,y2) = E[(y1-m1)(y2-m2)] • Covariance: measure of the linear association between y1 and y2. • E(y1.y2) = E(y1).E(y2) = m1.m2 (y1 and y2 are indep)

  10. Sampling and Sampling Distributions • The objective of statistical inference is to draw conclusions about a population using a sample from that population. • Random Sampling: each of N!/(N-n)!n! samples has an equal probability of being chosen. • Statistic: any function of observations in a sample that does not contain unknown parameters. • Sample mean and sample variance are both statistics.

  11. Properties of sample mean and variance • Sample mean is a point estimator of population mean m • Sample variance is a point estimator of population variance s2 • The point estimator should be unbiased. Long run average should be the parameter that is being estimated. • An unbiased estimator should have min variance. Min variance point estimator has a variance that is smaller than the variance of any other estimator of that parameter.

  12. Degrees of freedom • (n-1) in the previous eq is called the NDOF of the sum of squares. • NDOF of a sum of squares is equal to the no. of indep elements in the sum of squares • Because , only (n-1) of the n elements are indep, implying that SS has (n-1) DOF

  13. The normal and other sampling distributions • Normal Distribution • y is distributed normally with mean m and variance s2 • Standard normal distribution: m=0 and s2=1

  14. Central Limit Theorem • If y1, y2, …, yn is a sequence of n independent and identically distributed random variables with E(yi) = m and V(yi) = s2 (both finite) and x = y1+y2+…+yn, then has an approximate N(0,1) distribution in the sense that, if Fn(z) is the distribution function of zn and F(z) is the distribution function of the N(0,1) random variable, then

  15. Chi-Square or c2 distribution • If z1, z2, …, zn are normally and independently distributed random variables with mean 0 and variance 1 NID(0,1), then the random variable follows the chi-square distribution with k DOF.

  16. Chi-Square or c2 distribution • The distribution is asymmetric (skewed) m = k s2= 2k • Appendix III • F2-6

  17. Chi-Square or c2 distribution • y1, y2, …, yn is a random sample from N(m,s), then • SS/s2 is distributed as chi-square with n-1 DOF

  18. Chi-Square or c2 distribution • If the observations in the sample are NID(m,s), then the distribution of S2 is • Thus, the sampling distribution of the sample variance is a constant times the chi-square distribution if the population is normally distributed

  19. t distribution with k DOF • If z and are indpendent normal and chi-square random variables, the random variable • Follow t distribution with k DOF as follows:

  20. t distribution with k DOF m = 0 and s2 = k/(k-2) for k>2 • If k=infinity, t becomes standard normal • If y1, y2, …, yn is a random sample from N(m,s), then is distributed as t with n-1 DOF

  21. F distribution • If and are two independent chi-square random variables with u and v DOF, then the ratio • Follows the F dist with u numerator DOF and v denominator DOF

  22. F distribution • Two independent normal populations with common variance s2. If y11, y12, …, y1n1 is a random sample of n1 observations from 1st population and y21, y22, …, y2n2 is a random sample of n2 observations from 2nd population, then

  23. The Hypothesis Testing Framework • Statistical hypothesis testing is a useful framework for many experimental situations • We will use a procedure known as the two-sample t-test

  24. The Hypothesis Testing Framework • Sampling from a normal distribution • Statistical hypotheses:

  25. Estimation of Parameters

  26. Summary Statistics (pg. 36) Formulation 2 “Original recipe” Formulation 1 “New recipe”

  27. How the Two-Sample t-Test Works:

  28. How the Two-Sample t-Test Works:

  29. How the Two-Sample t-Test Works: • Values of t0 that are near zero are consistent with the null hypothesis • Values of t0 that are very different from zero are consistent with the alternative hypothesis • t0 is a “distance” measure-how far apart the averages are expressed in standard deviation units • Notice the interpretation of t0 as a signal-to-noiseratio

  30. The Two-Sample (Pooled) t-Test

  31. The Two-Sample (Pooled) t-Test t0 = -2.20 • So far, we haven’t really done any “statistics” • We need an objective basis for deciding how large the test statistic t0 really is • In 1908, W. S. Gosset derived the referencedistribution for t0 … called the t distribution • Tables of the t distribution - text, page 606

  32. The Two-Sample (Pooled) t-Test t0 = -2.20 • A value of t0 between –2.101 and 2.101 is consistent with equality of means • It is possible for the means to be equal and t0 to exceed either 2.101 or –2.101, but it would be a “rareevent” … leads to the conclusion that the means are different • Could also use the P-value approach

  33. Use of P-value in Hypothesis testing • P-value: smallest level of significance that would lead to rejection of the null hypothesis Ho • It is customary to call the test statistic significant when Ho is rejected. Therefore, the P-value is the smallest level a at which the data are significant.

  34. The Two-Sample (Pooled) t-Test t0 = -2.20 • The P-value is the risk of wrongly rejecting the null hypothesis of equal means (it measures rareness of the event) • The P-value in our problem is P = 0.042

  35. Minitab Two-Sample t-Test Results

  36. Checking Assumptions – The Normal Probability Plot • Assumptions • Equal variance • Normallity • Procedure: • Rank the observations in the sample in an ascending order. • Plot ordered observations vs observed commulative frequency (j-0.5)/n • If the plotted points deviate significantlly from straight line, the hypothesized model in not apporpriate.

  37. Checking Assumptions – The Normal Probability Plot

  38. Checking Assumptions – The Normal Probability Plot • The mean is estimated as the 50th percentile on the probability plot. • The standard deviation is estimated as the differnce between the 84th and 50th percentiles. • The assumption of equal population variances is simply verified by comparing the slopes of the two straight lines in F2-11. • Will use t-tests without extensive concern about the assumption of normallity

  39. Importance of the t-Test • Provides an objective framework for simple comparative experiments • Could be used to test all relevant hypotheses in a two-level factorial design, because all of these hypotheses involve the mean response at one “side” of the cube versus the mean response at the opposite “side” of the cube

  40. Confidence Intervals (See pg. 43) • Hypothesis testing gives an objective statement concerning the difference in means, but it doesn’t specify “how different” they are • Generalform of a confidence interval • The 100(1- α)% confidenceinterval on the difference in two means:

  41. Hypothesis testing • The test statitic becomes • This statistic is not distributed exactly as t. • The distribution of to is well approximated by t if we use as the DOF

  42. Hypothesis testing • The test statitic becomes • If both populations are normal, or if the sample sizes are large enough, the distribution of zo is N(0,1) if the null hypothesis is true. Thus, the critical region would be found using the normal distribution rather than the t. • We would reject Ho, if where za/2 is the upper a/2 percentage point of the standard normal distribution

  43. Hypothesis testing • The 100(1-a) percent confidence interval:

  44. Hypothesis testingComparing a single mean to a specified value • The hypothesises are: Ho: m = mo and H1: m ≠ mo • If the population is normal with known variance, or if the population is non-normal but the sample size is large enough, then the hypothesis may be tested by direct application of the normal distribution. • Test statistic • If Ho is true, then the distribution of zo is N(0,1). Therefore, Ho is rejected if

  45. Hypothesis testingComparing a single mean to a specified value The value of mo is usually determined in one of three ways: • From past evidence, knowledge, or experimentation • The result of some theory or model describing the situation under study • The result of contractual specifications

  46. Hypothesis testingComparing a single mean to a specified value • If the variance of the population is known, we assume that the population is normally distributed. • Test statistic • Ho is rejected if • The 100(1-a) percent confidence interval

  47. The paired comparison problemThe two tip hardness experiment • Statistical model • jth paired difference • Expected value of paired difference • Testing hypothesis: Ho: md=0 and H1: md≠0 • Test statistic:

  48. The paired comparison problemThe two tip hardness experiment • Randomized block design • Block: homogenous experimental unit • The block represents a restriction on complete randomization because the treatment combinations are only randomized within the block

  49. Inferences about the variablity of normal distributions • Ho: s2 = so and H1: s2 ≠ so • Test statistic • Ho is rejected if or • The 100(1-a) percent confidence interval

  50. Inferences about the variablity of normal distributions • Test statistic • Ho is rejected if or

More Related