490 likes | 506 Views
Review #2. Chapter 9 Chapter 10 Chapter 11 Chapter 12. Chapter 9. A statistic is a random variable describing a characteristic of a random samples. Sample mean Sample variance
E N D
Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12
Chapter 9 • A statistic is a random variable describing a characteristic of a random samples. • Sample mean • Sample variance • We use statistic values in inferential statistics (make inference about population characteristics from sample characteristics). • Statistics have distributions of their own.
The Central Limit Theorem • The distribution of the sample mean is normal if the parent distribution is normal. • The distribution of the sample mean approaches the normal distribution for sufficiently large samples (n ³ 30), even if the parent distribution is not normal. • The parameters of the sample distribution of the mean are: • Mean: • Standard deviation:
Problem 1 • Given a normal population whose mean is 50 and whose standard deviation is 5, • Find the probability that a random sample of 4 has a mean between 49 and 52 • Answer: -.4 .8
Problem 2 • Find the probability that a random sample of 16 has a mean between 49 and 52. • Answer
Problem 2 • The amount of time per day spent by adults watching TV is normally distributed with m=6 and s=1.5 hours. • What is the probability that a randomly selected adult watches TV for more than 7 hours a day? • Answer: • What is the probability that 5 adults watch TV on the average 7 or more hours? • Answer:
Problem 2 • Additional question • What is the probability that the total TV watching time of the five adults sampled will exceed 28 hours? • Answer:
Sampling distribution of the sample proportion • In a sample of size n, if np > 5 and n(1-p) > 5, then the sample proportion p = x/n is approximately normally distributed with the following parameters: ^
Problem 3 • A commercial of a household appliances manufacturer claims that less than 5% of all of its products require a service call in the first year. • A survey of 400 households that recently purchased the manufacturer products was conducted to check the claim.
Problem 3 • Assuming the manufacturer is right, what is the probability that more than 10% of the surveyed households require a service call within the first year? If indeed 10% of the sampled households reported a call for service within the first year, what does ittell you about the the manufacturer claim?
Chapter 10 • A population’s parameter can be estimated by a point estimator and by an interval estimator. • A confidence interval with 1-a confidence level is an interval estimator that covers the estimated parameters (1-a)% of the time. • Confidence intervals are constructed using sampling distributions.
a/2 a/2 1 - a -za/2 za/2 Confidence interval of the mean • We use the central limit theorem to build the following confidence interval
Problem 4 • How many classes university students miss each semester? A survey of 100 students was conducted. (see Missed Classes) • Assuming the standard deviation of the number of classes missed is 2.2, estimate the mean number of classes missed per student. • Use 99% confidence level.
Problem 4 • Solution = 10.21 2.575 = 10.21 .57 1- a = .99 a = .01 a/2 = .005 Za/2 = Z.005= 2.575 • LCL = 9.64, UCL = 10.78
Selecting the sample size • The shorter the confidence interval, the more accurate the estimate. • We can, therefore, limit the width of the interval to W, and get • From here we have
Problem 5 • An operation manager wants to estimate the average amount of time needed by a worker to assemble a new electronic component. • Sigma is known to be 6 minutes. • The required estimate accuracy is within 20 seconds. • The confidence level is 90%; 95%. • Find the sample size.
Problem 5 • Solution s = 6 min; W = 20 sec = 1/3 min; • 1 - a =.90 Za/2 = Z.05 = 1.645 • 1-a = .95, Za/2 = Z.025 = 1.96
Chapter 11 • Hypotheses tests • In hypothesis tests we hypothesize on a value of a population parameter, and test to see if there is sufficient evidence to support our belief. • The structure of hypotheses test • Formulate two hypotheses. • H0: The one we try to reject in favor of … • H1: The alternative hypothesis, the one we try to prove. • Define a significance level a.
Hypotheses tests • The significance level is the probability of erroneously reject the null hypothesis. a= P(reject H0 when H0 is true) • Sample from the population and calculate a statistic that provides an indication whether or not the parameter value defined under H1 is more probable. • We shall test the population mean assuming the standard deviation is known.
Problem 6 • A machine is set so that the average diameter of ball bearings it produces is .50 inch. In a sample of 100 ball bearings the mean diameter was .51 inch. Assuming the standard deviation is .05 inch, can we conclude at 5% significance level that the mean diameter is not .50 inch.
Problem 6 • The population studied is the ball-bearing diameters. • We hypothesize on the population mean. • A good point estimator for the population mean is the sample mean. • We use the distribution of the sample mean to build a sample statistic to test whether m = .50 inch.
Problem 6 • Solution • Define the hypotheses: • H0: m = .50 • H1: m = .50 Define a rejection region. Note that this is a two tail test because of the inequality. Probability of type one error
Problem 6 Critical Z Z.025 = 1.96 (obtained from the Z-table) Build a rejection region: Zsample> Za/2, or Zsample<-Za/2 1.96 -1.96 Calculate the value of the sample Z statistic and compare it to the critical value Since 2 > 1.96, there is sufficient evidence to rejectH0 in favor of H1 at 5% significance level.
Problem 6 • We can perform the test in terms of the mean value. • Let us find the critical mean values for rejection XL1=m0 + Z.025 =.50+1.96(.05/(100)1/2=.5098 XL2=m0 - Z.025 =.50 -1.96(.05/(100)1/2=.402 Since.51 > .5098, there is sufficient evidence to reject the null hypothesis at 5% significance level.
Problem 7 • The average annual return on investment for American banks was found to be 10.2% with standard deviation of 0.8%. • It is believed that banks that exercise comprehensive planning do better. • A sample of 26 banks that conducted a comprehensive training provided the following result: Mean return = 10.5%. • Can we infer that the belief about bank performance is supported at 10% significance level by this sample result?
Problem 7 • The population tested is the “annual rate of return.” H0: m = 10.2 H1: m > 10.2 • Let us perform the test with the p-value method: • P(X > 10.5given that m = 10.2) = P(Z > (10.5 – 10.2)/[.8/(26)1/2] = P(Z > 1.91) = 1 - .5719 = .0281 • Since .0281 < .10 we reject the null hypothesis at 10% significance level.
Problem 7 • Note the equivalence between the standardized method or the rejection region method and the p-value method. • P(Z>Z.10) = .10Z10 = 1.28 • Run the test with Data Analysis Plus.See data in Return .0281 1.28 1.91
Type II Error • Type II error occurs when H0 is erroneously not rejected. • The probability of a type II error is called b. b=P(Do not reject H0when H1is true) • To calculate b: • H1 specifies an actual parameter value (not a range of values). Example: H0: m = 100; H1: m = 110 • The critical value is expressed in original terms (not in standard terms).
Problem 7a • What is the probability you’ll believe the mean return in problem 7 is 10.2% while actually it’s 10.6%, if the sample provided a mean return of 10.5%?
Problem 7a • Solution • The two hypotheses are: H0: m = 10.2 H1: m = 10.6 • H0 is not rejected (we believe m = 10.2) if the sample mean is less than a critical value. • Therefore, the probability required is:b = P(X < Xcr | m = 10.6).
Problem 7a • The critical value is (recall, this problem was a case of a right hand tail test, with 10% significance level): b = P(X<10.4 when m = 10.6) = P(Z < (10.4-10.6)/[.8/(26)1/2]) = P(Z < -1.27) = .102
Chapter 12 • Generally, the standard deviation is unknown the same way the mean may be unknown. • When the standard deviation is unknown, we need to change the test statistic from “Z” to “t”. • We shall test three population parameters: • Mean • Variance • Proportion
Testing the mean (unknown variance) • Replace the statistic Z with “t” The original distribution must be normal (or at least mound shaped).
Problem 8 • A federal agency inspects packages to determine if the contents is at least as great as that advertised. • A random sample of (i)5, (ii)50 containers whose packaging states that the weight was 8.04 ounces was drawn. (See Content). • From the sample results… • Can we conclude that the average weight does not meet the weight stated? (use a = .05). • Estimate the mean weight of all containers with 99% confidence • What assumption must be met?
Problem 8 • Solution • We hypothesize on the mean weight. • H0: m = 8.04 • H1: m < 8.04 • (i) n=5. For small samples let us solve manuallyAssume the sample was: 8.07, 8.03, 7.99, 7.95, 7.94 • The rejection region: t < -ta, n-1 = -t.05,5-1 = -2.132The tsample = ? • Mean = (8.07+…+7.94)/5 = 7.996Std. Dev.={[(8.07-7.996)2+…+(7.94- 7.996)2]/4}1/2 = 0.054 -2.132
Problem 8 • The t sample is calculated as follows: • Since -1.32 > -2.132 the sample statistic does not fall into the rejection region. There is insufficient evidence to conclude that the mean weight is smaller than 8, at 5% significance level. -2.132 Rejection Region -1.32
Problem 8 • (ii) n=50. To calculate the sample statistics we use Excel, “Descriptive statistics” from the Tools>Data analysis menu. From the sample we obtain:Mean = 8.02; Std. Dev. = .04 • The confidence interval is calculated by= 8.02 2.678 = 8.02 .015 or LCL = 8.005, UCL = 8.35 1-a = .99 a = .01 a/2 = .005 t.005,50-1 = about 2.678 from the t - table
Problem 8 • Comments • Check whether it appears that the distribution is normal
Using Excel • To obtain an exact value for ‘t’ use the TINV function: The exact value: Degrees of freedom =TINV(0.01,49) 2.6799535 .01 is the two tail probability
Problem 8 • In our example recall: • H0: m = 8.4 • H1: m < 8.4 • The p-value = .000187 < .05 • There is sufficient evidence to reject the H0 in favor of H1. Note: t = (8.018-8.04)/[.0403/(50)1.2]=-3.82. < -t.05,49 = -1.676
Inference about the population Variance • The following statistic is c2 (Chi squared) distributed with n-1 degrees of freedom: • We use this relationship to test and estimate the variance.
Inference about the population Variance • The Hypotheses tested are: • The rejection region is:
Problem 9 • A random sample of 100 observations was taken from a normal population. The sample variance was 29.76. • Can we infer at 2.5% significance level that the population variance exceeds 30? • Estimate the population variance with 90% confidence.
(n – 1)s2 s02 (100 – 1)29.762 302 Problem 9 • Solution: • H0:s2 = 30 • H1:s2 < 30 c2 = = = 97.42 c2a,n-1 = c2.025,100-1 = about 129.561 • Since 97.42 < 129.42 we conclude that there is sufficient evidence at 2.5% significance level that the variance is smaller than 30. Rejection region: c2 < c2a, n-1 For the confidence interval look at page 370.
Using Excel • We can get an exact value of the probability P(c2d.f.> c2) = ? for a given c2and known d.f. This makes it possible to determine the p-value. • Use the CHIDIST function: For example: = .526 That is: P(c299> 97.42) = .526 • In our example we had a left hand tail rejection region. The p-value is calculated based on the c2 value (97.42): P(c299 < 97.42) = 1 - .526 =CHIDIST(c2,d.f.) = CHIDIST(97.42,99)
Using Excel • We can get the exact c2 value for which P(c2d.f.> c2) = a, for any given probability a and known d.f. • Use the CHIINV functionFor example: =CHIINV(.025,99) = 128.4219 That is: P(c299 > ?) = .025. c2 = 128.4219 =CHIINV(a,d.f.)
Inference about a population proportion • The test and the confidence interval are based on the approximated normal distribution of the sample proportion, if np>5 and n(1-p)>5. • For the confidence interval of p we have: where p = x/n • For the hypotheses test, we run a Z test. ^
Problem 10 • A consumer protection group run a survey of 400 dentists to check a claim that 4 out of 5 dentists recommend ingredients included in a certain toothpaste. • The survey results are as follows: 71 – No; 329 – Yes • At 5% significance level, can the consumer group infer that the claim is true?
Problem 10 • Solution • The two hypotheses are: • H0: p = .8 • H1: p > .8 Z.05 = 1.645 • Since 1.18 < 1.645 the consumer group cannot confirm the claim at 5% significance level. The rejection region: Z > Za