680 likes | 717 Views
Economics 173 Business Statistics. Lectures 5 & 6 Summer, 2001 Professor J. Petry. Inference About the Description of a Single Population. Chapter 11. 11.1 Introduction. In this chapter we utilize the approach developed before for making statistical inference about populations.
E N D
Economics 173Business Statistics Lectures 5 & 6 Summer, 2001 Professor J. Petry
Inference About the Description of a Single Population Chapter 11
11.1 Introduction • In this chapter we utilize the approach developed before for making statistical inference about populations. • Identify the parameter to be estimated or tested . • Specify the parameter’s estimator and its sampling distribution. • Construct an interval estimator or perform a test.
We will develop techniques to estimate and test three population parameters. • The expected value m • The variance s2 • The population proportion p (for qualitative data) • Examples • A bank conducts a survey to estimate the number of times customer will actually use ATM machines. • A random sample of processing times is taken to test the mean production time and the variance of production time on a production line.
Recall that when sis known is normally distributed • Ifthe sample is drawn from a normal population, or if • the population is not normal but the sample is sufficiently large. • When sis unknown, we use its point estimator s, and the Z statistic is replaced then by the t-statistic 11.2 Inference About a Population Mean When the Population Standard Deviation Is Unknown
Z Z t t Z t t Z t t Z t t Z t t t t t s s s s s s s s s s When the sampled population is normally distributed, the statistic t is Student t distributed. The “degrees of freedom”, a function of the sample size determines how spread the distribution is (compared to the normal distribution) The t distribution is mound-shaped, and symmetrical around zero. d.f. = n2 d.f. = n1 n1 < n2 0
Probability calculations for the t distribution • The t table provides critical value for various probabilities of interest. • The form of the probabilities that appear in table 4 Appendix B are: P(t > tA, d.f.) = A • For a given degree of freedom, and for a predetermined right hand tail probability A, the entry in the table is the corresponding tA. • These values are used in computing interval estimates and performing hypotheses tests.
A = .05 tA t.100 t.05 t.025 t.01 t.005
Testing the population mean when the population standard deviation is unknown • If the population is normally distributed, the test statistic for m when s is unknown is t. • This statistic is Student t distributed with n-1 degrees of freedom.
Example 11.1 Trainees productivity • In order to determine the number of workers required to meet demand, the productivity of newly hired trainees is studied. • It is believed that trainees can process and distribute more than 450 packages per hour within one week of hiring. • Can we conclude that this belief is correct, based on productivity observation of 50 trainees, See file XM11-01.
Solution • The problem objective is to describe the population of the number of packages processed in one hour. • The data are quantitative. H0:m = 450 H1:m > 450 • The t statistic d.f. = n - 1 = 49
Solving by hand • The rejection region is t > ta,n - 1 • ta,n - 1 = t.05,49 = approximately to 1.676. • From the data we have
Rejection region 1.676 1.89 • The test statistic is • Since 1.89 > 1.676 we reject the null hypothesis in favor of the alternative. • There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level.
.05 .0323 • Since .0323 < .05, we reject the null hypothesis in favor of the alternative. • There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level.
Estimating the population mean when the population standard deviation is unknown • Confidence interval estimator of m when sis unknown
Example 11.2 • An investor is trying to estimate the return on investment in companies that won quality awards last year. • A random sample of 50 such companies is selected, and the return on investment is calculated had he invested in them. • Construct a 95% confidence interval for the mean return. • From the data we determine,
Solution • The problem objective is to describe the population of annual returns from buying shares of quality award-winners. • The data are quantitative. • Solving by hand • From the data we determine
Checking the required conditions • We need to check that the population is normally distributed, or at least not extremely non-normal. • There are statistical methods to test for normality (to be introduced later). • Currently, we can plot the histogram of the data set.
A Histogram for XM11- 01 Packages A Histogram for XM11- 02 Returns
11.3 Inference About a Population Variance • Some times we are interested in making inference about the variability of processes. • Examples: • The consistency of a production process for quality control purposes. • Investors use variance as a measure of risk. • To draw inference about variability, the parameter of interest is s2.
The sample variance s2 is an unbiased, consistent and efficient point estimator for s2. • The statistic has a distribution called Chi-squared, if the population is normally distributed. d.f. = 1 d.f. = 10 d.f. = 5
The c2 table A =.01 A A =.01 1 - A =.99 c21-A c2A .010 .990 c2.01,10 = 23.2093 c2.995 c2.990 c2.975 c2.010 c2.005
Estimating the population variance • From the following probability statement P(c21-a/2 < c2 < c2a/2) = 1-awe have (by substituting c2 = [(n - 1)s2]/s2.)
Example 11.3 (operation management application) • A container-filling machine is believed to fill 1 liter containers so consistently, that the variance of the filling will be less than 1 cc (.001 liter). • To test this belief a random sample of 25 1-liter fills was taken, and the results recorded. • The data are provided in file XM11-03. • Do these data support the belief that the variance is less than 1cc at 5% significance level?
Solution • The problem objective is to describe the population of 1-liter fills from a filling machine. • The data are quantitative, and we are interested in the variability of the fills. • The complete test is: H0:s2 = 1 H1: s2 <1 We want to prove that the process is consistent
Solving by hand • Note that (n - 1)s2 = S(xi - x)2 = Sxi2 - Sxi/n • From the sample (data is presented in units of cc-1000 to avoid rounding) we can calculate Sxi = -3.6, and Sxi2 = 21.3. • Then (n - 1)s2 = 21.3 - (-3.6)2/25 = 20.8. • The complete test is shown next There is insufficient evidence to reject the hypothesis that the variance is equal to 1cc, in favor of the hypothesis that it is smaller.
a = .05 1-a = .95 Rejection region 13.8484 20.8 Do not reject the null hypothesis
11.4 Inference About a Population Proportion • When the population consists of qualitative or categorical data, the only inference we can make is about the proportion of occurrence of a certain value. • The parameter “p” was used before to calculate probabilities using the binomial distribution.
Under certain conditions, [np > 5 and n(1-p) > 5], is approximately normally distributed, withm = p and s2 = p(1 - p)/n. • Statistic and sampling distribution • the statistic employed is
Test statistic for p • Interval estimator for p (1-a confidence level)
Example 11.5 (marketing application) • For a new newspaper to be financially viable, it has to capture at least 12% of the Toronto market. • In a survey conducted among 400 randomly selected prospective readers, 58 participants indicated they would subscribe to the newspaper if its cost did not exceed $20 a month. • Can the publisher conclude that the proposed newspaper will be financially viable at 10% significance level?
Solution • The problem objective is to describe the population of newspaper readers in Toronto. • The responses to the survey are qualitative. • The parameter to be tested is “p”. • The hypotheses are: H0: p = .12 H1: p > .12 We want to prove that the newspaper is financially viable
Solving by hand • The rejection region is z > za = z.10 = 1.28. • The sample proportion is • The value of the test statistic is • The p-value is = P(Z>1.54) = .0618 There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At 10% significance level we can argue that at least 12% of Toronto’s readers will subscribe to the new newspaper.
Example 11.6 (marketing application) • In a survey of 2000 TV viewers at 11.40 p.m. on a certain night, 226 indicated they watched “The Tonight Show”. • Estimate the number of TVs tuned to the Tonight Show in a typical night, if there are 100 million potential television sets. Use a 95% confidence level. • Solution
Selecting the Sample Size to Estimate the Proportion • The interval estimator for the proportion is • Thus, if we wish to estimate the proportion to within W, we can write • The required sample size is
Example • Suppose we want to estimate the proportion of customers who prefer our company’s brand to within .03 with 95% confidence. • Find the sample size needed to guarantee that this requirement is met. • Solution W = .03; 1 - a = .95, therefore a/2 = .025, so z.025 = 1.96 Since the sample has not yet been taken, the sample proportion is still unknown. We proceed using either one of the following two methods:
Method 2: • There is some idea about the value of • Use the value of to calculate the sample size • Method 1: • There is no knowledge about the value of • Let , which results in the largest possible n needed for a 1-a confidence interval. • If the sample proportion does not equal .5, the actual W will be narrower than .03.
Chapter 12 Inference about the Comparison ofTwo Populations
12.1 Introduction • Variety of techniques are presented whose objective is to compare two populations. • We are interested in: • The difference between two means. • The ratio of two variances. • The difference between two proportions.
12.2 Inference about the Difference b/n Two Means: Independent Samples • Two random samples are drawn from the two populations of interest. • Because we are interested in the difference between the two means, we shall build the statistic for each sample (and support the analysis by the statistic S2 as well).
The Sampling Distribution of • is normally distributed if the (original) population distributions are normal . • is approximately normally distributed if the (original) population is not normal, but the sample size is large. • Expected value of is m1 - m2 • The variance of is s12/n1 + s22/n2
If the sampling distribution of is normal or approximately normal we can write: • Z can be used to build a test statistic or a confidence interval for m1 - m2
Practically, the “Z” statistic is hardly used, because the population variances are not known. t ? ? S12 S22 • Instead, we construct a “t” statistic using the • sample “variances” (S12 and S22).
Two cases are considered when producing the t-statistic. • The two unknown population variances are equal. • The two unknown population variances are not equal.
Example: S12 = 25; S22 = 30; n1 = 10; n2 = 15. Then, Case I: The two variances are equal • Calculate the pooled variance estimate by: n2 = 15 n1 = 10
Build an interval estimate or 0 • Construct the t-statistic as follows: • Perform a hypothesis test • H0: m1 - m2 = 0 • H1: m1 - m2 > 0; or < 0;
Run a hypothesis test as needed, or, build an interval estimate
Example12.1 • Do people who eat high-fiber cereal for breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast? • A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal. • For each person the number of calories consumed at lunch was recorded.