720 likes | 804 Views
Learn essential statistical concepts, distributions, and sampling methods for industrial applications. Explore means, variances, and inferences in randomized and paired comparison designs. Understand how to formulate and analyze data for better decision-making.
E N D
Some Basic Statistical Concepts Dr. Tai-Yue Wang Department of Industrial and InformationManagement National Cheng Kung University Tainan, TAIWAN, ROC
Outline • Introduction • Basic Statistical Concepts • Inferences about the differences in Means, Randomized Designs • Inferences about the Differences in Means, Paired Comparison Designs • Inferences about the Variances of Normal Distribution
Introduction • Formulation of a cement mortar (水泥漿) • Original formulation and modified formulation • 10 samples for each formulation • One factor formulation • Two formulations: two treatments two levels of the factor formulation
Introduction Results:
Introduction Dot diagram
Basic Statistical Concepts • Experiences from above example • Run – each of above observations • Noise, experimental error, error – the individual runs difference • Statistical error– arises from variation that is uncontrolled and generally unavoidable • The presence of error means that the response variable is a random variable • Random variable could be discrete or continuous
Basic Statistical Concepts • Describing sample data • Graphical descriptions • Dot diagram—central tendency, spread • Box plot – • Histogram
Basic Statistical Concepts • Discrete vs continuous
Basic Statistical Concepts • Probability distribution • Discrete • Continuous
Basic Statistical Concepts • Probability distribution • Mean—measure of its central tendency • Expected value –long-run average value
Basic Statistical Concepts • Probability distribution • Variance —variability or dispersion of a distribution
Basic Statistical Concepts • Probability distribution • Properties: c is a constant • E(c) = c • E(y)= μ • E(cy)=cE(y)=cμ • V(c)=0 • V(y)= σ2 • V(cy)=c2 σ2 • E(y1+y2)= μ1+ μ2
Basic Statistical Concepts • Probability distribution • Properties: c is a constant • V(y1+y2)=V(y1)+V(y2)+2Cov(y1, y2) • V(y1-y2)=V(y1)+V(y2)-2Cov(y1, y2) • If y1and y2are independent, Cov(y1, y2) =0 • E(y1*y2)=E(y1)*V(y2)= μ1* μ2 • E(y1/y2) is not necessary equal toE(y1)/V(y2)
Basic Statistical Concepts • Sampling and sampling distribution • Random samples -- if the population contains N elements and a sample of n of them is to be selected, and if each of N!/[(N-n)!n!] possible samples has equal probability being chosen • Random sampling – above procedure • Statistic – any function of the observations in a sample that does not contain unknown parameters
Basic Statistical Concepts • Sampling and sampling distribution • Sample mean • Sample variance
Basic Statistical Concepts • Sampling and sampling distribution • Estimator – a statistic that correspond to an unknown parameter • Estimate – a particular numerical value of an estimator • Point estimator: to μ and s2 to σ2 • Properties on sample mean and variance: • The point estimator should be unbiased • An unbiased estimator should have minimum variance
Basic Statistical Concepts • Sampling and sampling distribution • Sum of squares, SS in • Sum of squares, SS, can be defined as
Basic Statistical Concepts • Sampling and sampling distribution • Degree of freedom, v, number of independent elements in a sum of square in • Degree of freedom, v , can be defined as
Basic Statistical Concepts • Sampling and sampling distribution • Normal distribution, N
Basic Statistical Concepts • Sampling and sampling distribution • Standard Normal distribution, z, a normal distribution with μ=0 andσ2=1
Basic Statistical Concepts • Sampling and sampling distribution • Central Limit Theorem– If y1, y2, …, ynis a sequence of n independent and identically distributed random variables with E(yi)=μand V(yi)=σ2and x=y1+y2+…+yn, then the limiting form of the distribution of as n∞, is the standard normal distribution
Basic Statistical Concepts • Sampling and sampling distribution • Chi-square, χ2 , distribution– If z1, z2, …, zkare normally and independently distributed random variables with mean 0 and variance 1, NID(0,1), the random variable follows the chi-square distribution with k degree of freedom.
Basic Statistical Concepts • Sampling and sampling distribution • Chi-square distribution– example If y1, y2, …, ynare random samples from N(μ, σ2), distribution, • Sample variance from NID(μ, σ2),
Basic Statistical Concepts • Sampling and sampling distribution • t distribution– If z and are independent standard normal and chi-square random variables, respectively, the random variable follows t distribution with k degrees of freedom
Basic Statistical Concepts • Sampling and sampling distribution • pdf of t distribution– μ =0, σ2=k/(k-2) for k>2
Basic Statistical Concepts • Sampling and sampling distribution • If y1, y2, …, ynare random samples from N(μ, σ2), the quantity is distributed as t with n-1 degrees of freedom
Basic Statistical Concepts • Sampling and sampling distribution • F distribution— If and are two independent chi-square random variables with u and v degrees of freedom, respectively follows F distribution with u numerator degrees of freedom and v denominator degrees of freedom
Basic Statistical Concepts • Sampling and sampling distribution • pdf of F distribution–
Basic Statistical Concepts • Sampling and sampling distribution • F distribution– example Suppose we have two independent normal distributions with common variance σ2 , if y11, y12, …, y1n1 is a random sample of n1 observations from the first population and y21, y22, …, y2n2 is a random sample of n2 observations from the second population
The Hypothesis Testing Framework • Statistical hypothesis testing is a useful framework for many experimental situations • Origins of the methodology date from the early 1900s • We will use a procedure known as the two-sample t-test
Two-Sample-t-Test • Suppose we have two independent normal, if y11, y12, …, y1n1 is a random sample of n1 observations from the first population and y21, y22, …, y2n2 is a random sample of n2 observations from the second population
Two-Sample-t-Test • A model for data ε is a random error
Two-Sample-t-Test • Sampling from a normal distribution • Statistical hypotheses:
Two-Sample-t-Test • H0 is called the null hypothesis and H1 is call alternative hypothesis. • One-sided vs two-sided hypothesis • Type I error, α: the null hypothesis is rejected when it is true • Type II error, β: the null hypothesis is not rejected when it is false
Two-Sample-t-Test • Power of the test: • Type I error significance level • 1- α = confidence level
Two-Sample-t-Test • Two-sample-t-test • Hypothesis: • Test statistic: where
Example --Summary Statistics Formulation 2 “Original recipe” Formulation 1 “New recipe”
Two-Sample-t-Test--How the Two-Sample t-Test Works: • Values of t0 that are near zero are consistent with the null hypothesis • Values of t0 that are very different from zero are consistent with the alternative hypothesis • t0 is a “distance” measure-how far apart the averages are expressed in standard deviation units • Notice the interpretation of t0 as a signal-to-noiseratio
Two-Sample-t-Test • P-value– The smallest level of significance that would lead to rejection of the null hypothesis. • Computer application Two-Sample T-Test and CI Sample N Mean StDev SE Mean 1 10 16.760 0.316 0.10 2 10 17.040 0.248 0.078 Difference = mu (1) - mu (2) Estimate for difference: -0.280 95% CI for difference: (-0.547, -0.013) T-Test of difference = 0 (vs not =): T-Value = -2.20 P-Value = 0.041 DF = 18 Both use Pooled StDev = 0.2840
William Sealy Gosset (1876, 1937) Gosset's interest in barley cultivation led him to speculate that design of experiments should aim, not only at improving the average yield, but also at breeding varieties whose yield was insensitive (robust) to variation in soil and climate. Developed the t-test (1908) Gosset was a friend of both Karl Pearson and R.A. Fisher, an achievement, for each had a monumental ego and a loathing for the other. Gosset was a modest man who cut short an admirer with the comment that “Fisher would have discovered it all anyway.”
The Two-Sample (Pooled) t-Test t0 = -2.20 • So far, we haven’t really done any “statistics” • We need an objective basis for deciding how large the test statistic t0 really is • In 1908, W. S. Gosset derived the referencedistribution for t0 … called the t distribution • Tables of the t distribution – see textbook appendix
The Two-Sample (Pooled) t-Test t0 = -2.20 • A value of t0 between –2.101 and 2.101 is consistent with equality of means • It is possible for the means to be equal and t0 to exceed either 2.101 or –2.101, but it would be a “rareevent” … leads to the conclusion that the means are different • Could also use the P-value approach