PTTE 434 Quality Organization & Management J. R. Wixson - Instructor

PTTE 434Quality Organization & ManagementJ. R. Wixson - Instructor Ch 10: Basic Concepts of Statistics and Probability

Class Objectives • Learn about the standard normal distribution • Discuss descriptive and inferential statistics • Learn how to calculate proportions under the normal curve. • Discuss sampling distributions • Learn how to calculate sample size from a normal distribution • Discuss Hypothesis Testing 2 approaches: • Classical method • P-value method

Chapter Overview • Statistical Fundamentals • Process Control Charts • Some Control Chart Concepts • Process Capability • Other Statistical Techniques in Quality Management

Statistical Fundamentals • Statistical Thinking • Is a decision-making skill demonstrated by the ability to draw to conclusions based on data. • Why Do Statistics Sometimes Fail in the Workplace? • Regrettably, many times statistical tools do not create the desired result. Why is this so? Many firms fail to implement quality control in a substantive way.

Statistical Fundamentals • Reasons for Failure of Statistical Tools • Lack of knowledge about the tools; therefore, tools are misapplied. • General disdain for all things mathematical creates a natural barrier to the use of statistics. • Cultural barriers in a company make the use of statistics for continual improvement difficult. • Statistical specialists have trouble communicating with managerial generalists.

Statistical Fundamentals • Reasons for Failure of Statistical Tools (continued) • Statistics generally are poorly taught, emphasizing mathematical development rather than application. • People have a poor understanding of the scientific method. • Organization lack patience in collecting data. All decisions have to be made “yesterday.”

Statistical Fundamentals • Reasons for Failure of Statistical Tools (continued) • Statistics are view as something to buttress an already-held opinion rather than a method for informing and improving decision making. • Most people don’t understand random variation resulting in too much process tampering.

Statistical Fundamentals • Understanding Process Variation • Random variation is centered around a mean and occurs with a consistent amount of dispersion. • This type of variation cannot be controlled. Hence, we refer to it as “uncontrolled variation.” • The statistical tools discussed in this chapter are not designed to detect random variation.

Statistical Fundamentals • Understanding Process Variation (cont.) • Nonrandom or “special cause” variation results from some event. The event may be a shift in a process mean or some unexpected occurrence. • Process Stability • Means that the variation we observe in the process is random variation. To determine process stability we use process charts.

Statistical Fundamentals • Sampling Methods • To ensure that processes are stable, data are gathered in samples. • Random samples. Randomization is useful because it ensures independence among observations. To randomize means to sample is such a way that every piece of product has an equal chance of being selected for inspection. • Systematic samples. Systematic samples have some of the benefits of random samples without the difficulty of randomizing.

Statistical Fundamentals • Sampling Methods • To ensure that processes are stable, data are gathered in samples (continued) • Sampling by Rational Subgroup. A rational subgroup is a group of data that is logically homogenous; variation within the data can provide a yardstick for setting limits on the standard variation between subgroups.

Sampling Distributions

Sampling Distributions • If you compute the mean of a sample of 10 numbers, the value you obtain will not equal the population mean exactly; by chance it will be a little bit higher or a little bit lower. • If you sampled sets of 10 numbers over and over again (computing the mean for each set), you would find that some sample means come much closer to the population mean than others. Some would be higher than the population mean and some would be lower. • Imagine sampling 10 numbers and computing the mean over and over again, say about 1,000 times, and then constructing a relative frequency distribution of those 1,000 means.

Sampling Distributions • The distribution of means is a very good approximation to the sampling distribution of the mean. • The sampling distribution of the mean is a theoretical distribution that is approached as the number of samples in the relative frequency distribution increases. • With 1,000 samples, the relative frequency distribution is quite close; with 10,000 it is even closer. • As the number of samples approaches infinity, the relative frequency distribution approaches the sampling distribution

Sampling Distributions • The sampling distribution of the mean for a sample size of 10 was just an example; there is a different sampling distribution for other sample sizes. • Also, keep in mind that the relative frequency distribution approaches a sampling distribution as the number of samples increases, not as the sample size increases since there is a different sampling distribution for each sample size.

Sampling Distributions • A sampling distribution can also be defined as the relative frequency distribution that would be obtained if all possible samples of a particular sample size were taken. • For example, the sampling distribution of the mean for a sample size of 10 would be constructed by computing the mean for each of the possible ways in which 10 scores could be sampled from the population and creating a relative frequency distribution of these means. • Although these two definitions may seem different, they are actually the same: Both procedures produce exactly the same sampling distribution.

Sampling Distributions • Statistics other than the mean have sampling distributions too. The sampling distribution of the median is the distribution that would result if the median instead of the mean were computed in each sample. • Students often define "sampling distribution" as the sampling distribution of the mean. That is a serious mistake. • Sampling distributions are very important since almost all inferential statistics are based on sampling distributions.

Sampling Distribution of the mean • The sampling distribution of the mean is a very important distribution. In later chapters you will see that it is used to construct confidence intervals for the mean and for significance testing. • Given a population with a mean of  and a standard deviation of , the sampling distribution of the mean has a mean of  and a standard deviation of s/ N , where N is the sample size. • The standard deviation of the sampling distribution of the mean is called the standard error of the mean. It is designated by the symbol .

Sampling Distribution of the mean • Note that the spread of the sampling distribution of the mean decreases as the sample size increases. An example of the effect of sample size is shown above. Notice that the mean of the distribution is not affected by sample size.

Spread A variable's spread is the degree scores on the variable differ from each other. If every score on the variable were about equal, the variable would have very little spread. There are many measures of spread. The distributions on the right side of this page have the same mean but differ in spread: The distribution on the bottom is more spread out. Variability and dispersion are synonyms for spread.

Standard normal distribution • The standard normal distribution is a normaldistributionwith a mean of 0 and a standard deviation of 1. Normal distributions can be transformed to standard normal distributions by the formula: • X is a score from the original normal distribution, is the mean of the original normal distribution, and is the standard deviation of original normal distribution.

Standard normal distribution • A z score always reflects the number of standard deviations above or below the mean a particular score is. • For instance, if a person scored a 70 on a test with a mean of 50 and a standard deviation of 10, then they scored 2 standard deviations above the mean. Converting the test scores to z scores, an X of 70 would be: • So, a z score of 2 means the original score was 2 standard deviations above the mean. Note that the z distribution will only be a normal distribution if the original distribution (X) is normal.

Applying the formula Applying the formula will always produce a transformed variable with a mean of zero and a standard deviation of one. However, the shape of the distribution will not be affected by the transformation. If X is not normal then the transformed distribution will not be normal either. One important use of the standard normal distribution is for converting between scores from a normal distribution and percentile ranks. Areas under portions of the standard normal distribution are shown to the right. About .68 (.34 + .34) of the distribution is between -1 and 1 while about .96 of the distribution is between -2 and 2.

Area under a portion of the normal curve - Example 1 If a test is normally distributed with a mean of 60 and a standard deviation of 10, what proportion of the scores are above 85? From the Z table, it is calculated that .9938 of the scores are less than or equal to a score 2.5 standard deviations above the mean. It follows that only 1-.9938 = .0062 of the scores are above a score 2.5 standard deviations above the mean. Therefore, only .0062 of the scores are above 85.

Example 2 The z table is used to determine that .9772 of the scores are below a score 2 standard deviations above the mean. • Suppose you wanted to know the proportion of students receiving scores between 70 and 80. The approach is to figure out the proportion of students scoring below 80 and the proportion below 70. • The difference between the two proportions is the proportion scoring between 70 and 80. • First, the calculation of the proportion below 80. Since 80 is 20 points above the mean and the standard deviation is 10, 80 is 2 standard deviations above the mean.

Example 2 Cont’d • The difference between the two proportions is the proportion scoring between 70 and 80. • Next, calculate the proportion below 70. Note that the area of the curve below 70 is 1 standard deviation, or .1359 • To calculate the proportion between 70 and 80, subtract the proportion above 80 from the proportion below 70. That is .8413 - .0228 = .1359. • Therefore, only 13.59% of the scores are between 70 and 80. To calculate the proportion below 70:

Example 3 • Assume a test is normally distributed with a mean of 100 and a standard deviation of 15. What proportion of the scores would be between 85 and 105? • The solution to this problem is similar to the solution to the last one. The first step is to calculate the proportion of scores below 85. • Next, calculate the proportion of scores below 105. Finally, subtract the first result from the second to find the proportion scoring between 85 and 105.

Example 3 Begin by calculating the proportion below 85. 85 is one standard deviation below the mean: Using the z-tablewith the value of -1 for z, the area below -1 (or 85 in terms of the raw scores) is .1587. Do the same for 105

Example 3 The z-tableshows that the proportion scoring below .333 (105 in raw scores) is .6304. The difference is .6304 - .1587 = .4714. So .4714 of the scores are between 85 and 105. Go to:http://davidmlane.com/hyperstat/z_table.htmlfor Z table.

Sampling Distributions

Sampling Distributions • If you compute the mean of a sample of 10 numbers, the value you obtain will not equal the population mean exactly; by chance it will be a little bit higher or a little bit lower. • If you sampled sets of 10 numbers over and over again (computing the mean for each set), you would find that some sample means come much closer to the population mean than others. Some would be higher than the population mean and some would be lower. • Imagine sampling 10 numbers and computing the mean over and over again, say about 1,000 times, and then constructing a relative frequency distribution of those 1,000 means.

5 Samples

10 Samples

15 Samples

20 Samples

100 Samples

1,000 Samples

10,000 Samples

Sampling Distributions • The distribution of means is a very good approximation to the sampling distribution of the mean. • The sampling distribution of the mean is a theoretical distribution that is approached as the number of samples in the relative frequency distribution increases. • With 1,000 samples, the relative frequency distribution is quite close; with 10,000 it is even closer. • As the number of samples approaches infinity, the relative frequency distribution approaches the sampling distribution

Sampling Distributions • The sampling distribution of the mean for a sample size of 10 was just an example; there is a different sampling distribution for other sample sizes. • Also, keep in mind that the relative frequency distribution approaches a sampling distribution as the number of samples increases, not as the sample size increases since there is a different sampling distribution for each sample size.

Sampling Distributions • A sampling distribution can also be defined as the relative frequency distribution that would be obtained if all possible samples of a particular sample size were taken. • For example, the sampling distribution of the mean for a sample size of 10 would be constructed by computing the mean for each of the possible ways in which 10 scores could be sampled from the population and creating a relative frequency distribution of these means. • Although these two definitions may seem different, they are actually the same: Both procedures produce exactly the same sampling distribution.

Sampling Distributions • Statistics other than the mean have sampling distributions too. The sampling distribution of the median is the distribution that would result if the median instead of the mean were computed in each sample. • Students often define "sampling distribution" as the sampling distribution of the mean. That is a serious mistake. • Sampling distributions are very important since almost all inferential statistics are based on sampling distributions.

Sampling Distribution of the mean • Note that the spread of the sampling distribution of the mean decreases as the sample size increases. An example of the effect of sample size is shown above. Notice that the mean of the distribution is not affected by sample size.

Spread A variable's spread is the degree scores on the variable differ from each other. If every score on the variable were about equal, the variable would have very little spread. There are many measures of spread. The distributions on the right side of this page have the same mean but differ in spread: The distribution on the bottom is more spread out. Variability and dispersion are synonyms for spread.

Standard Error in Relation to Sample Size Notice that the graph is consistent with the formulas. If is sm= 10 for a sample size of 1 then sm should be equal to for a sample size of 25. When s is used as an estimate of σ, the estimated standard error of the mean is . The standard error of the mean is used in the computation of confidence intervals and significance tests for the mean.

60 50 40 95 percent upper confidence limit 30 20 10 0 60 80 90 100 10 20 30 40 50 70 N -10 Number of tests -20 -30 95 percent lower confidence limit -40 -50 Figure 11.3 Width of confidence interval versus number of tests. -60

SEE TABLE 10.6 Summary of common probability distributions.

Central Limit Theorem The central limit theorem states that given a distribution with a mean μ and variance σ2, the sampling distribution of the mean approaches a normal distribution with a mean (μ) and a variance σ2/N as N, the sample size, increases. Go to Central Limit Demonstration: http://oak.cats.ohiou.edu/~wallacd1/ssample.html

Central Limit Theorem • The central limit theorem also states that the larger our set of samples the more normal our distribution will be. • Thus, the sampling distribution of the mean will have a normal shape and be come increasingly normal in shape as the number of samples increases. • The sampling distribution of the mean will be normal regardless of the shape of the population distribution. • Whether the population distribution is normal,positively or negatively skewed, unimodal or bimodal in shape,the sampling distribution of the mean will have a normal shape.

Central Limit Theorem (Cont’d) • In the following example we start out with a uniform distribution. The sampling distribution of the mean, however, will contain variability in the mean values we obtain from sample to sample. Thus, the sampling distribution of the mean will have a normal shape, even though the population distribution does not. Notice that because we are taking a sample of values from all parts of the population, the mean of the samples will be close to the center of the population distribution.

PTTE 434 Quality Organization & Management J. R. Wixson - Instructor