440 likes | 454 Views
Study guide for Stat 31 Midterm 2 on hypothesis testing, t-distribution, and paired samples. Includes examples and Excel computations.
E N D
Midterm 2, Tuesday, April 12 • Bring one 8.5 x 11 sheet of formulas • Study by re-working HW • Material: HW 5 - HW 10 (due Thurs.) • Extra Office Hours: • Monday, April 11, • 10:00 – 12:00, 1:30 – 3:30 • Pickup HW 10 that day
Stat 31, Section 1, Last Time • Hypothesis Testing Terminology: • Type I – II Error • Specificity & Sensitivity • Deeper inference • T distribution to handle unknown • Approx. by increases variation • TDIST & TINV • Problems with EXCEL
t - Distribution Application 2: Hypothesis Tests Idea: Calculate P-values using TDIST
t – Distribution Hypo Testing E.g. Old Textbook Example 7.26 For the above DDT poisoning example, Suppose that the mean “absolutely refractory period” is known to be 1.3. DDT poisoning should slow nerve recovery, and so increase this period. Do the data give good evidence for this supposition?
t – Distribution Hypo Testing E.g. Old Textbook Example 7.26 Let = population mean absolutely refractory period for poisoned rats. (from before)
t – Distribution Hypo Testing E.g. Old Textbook Example 7.26 P-value = P{what saw or more conclusive | H0 – HA Bdry}
t – Distribution Hypo Testing E.g. Old Textbook Example 7.26 From Class Example 25, part 2: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg25.xls = 0.003 Interpretation: very strong evidence, for either yes-no or gray-level
t – Distribution Hypo Testing Variations: • For “opposite direction” hypotheses: P-value = Then use symmetry, i.e. put - into TDIST.
t – Distribution Hypo Testing Variations: • For 2-sided hypotheses: Use 2-tailed version of TDIST.
t – Distribution Hypo Testing HW: 7.5 7.13 a e, 7.14 a c, Interpret P-values: (i) yes-no (ii) gray-level
t – Distribution Variation: Paired Differences Have Treatment 1: Treatment 2: Idea: Apply above methods to:
Paired Samples E.g. Old Textbook 7.32 (now 7.39): Researchers studying Vitamin C in a product were concerned about loss of Vitamin C during shipment and storage. They marked a collection of bags at the factory, and measured the vitamin C. 5 months later, in Haiti, they found the same bags, and again measured the Vitamin C.
Paired Samples E.g. Old Textbook 7.32 (now 7.39): The data are the two Vitamin C measurements, made for each bag. • Set up hypotheses to examine the question of interest. • Perform the significance test, and summarize the result. • Find 95% CIs for the factory mean, and the Haiti mean, and the mean change.
Paired Samples E.g. Old Textbook 7.32 (now 7.39): a. Sample average difference = Some evidence factory is bigger, is it strong evidence??? Let = Difference: Haiti – Factory 1-sided, from “idea of loss”
Paired Samples E.g. Old Textbook 7.32 (now 7.39): b.
Paired Samples E.g. Old Textbook 7.32 (now 7.39): b. But recall how TDIST works: = So compute with:
Paired Samples E.g. Old Textbook 7.32 (now 7.39): • Excel Computation: Class Example 25, Part 3 https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg25.xls P-value = 1.87 x 10-5 Interpretation: very strong evidence either yes-no or gray level
Paired Samples Variations: • EXCEL function TTEST is useful here Notes: • Type is paired (discuss others later) • Get same answer from swapping Array 1 and Array 2 (check these in class example)
Paired Samples Variations: • Can also use: Tools Data Analysis T-test Paired to give detailed results e.g. d.f. = 26 (others we haven’t learned yet)
Paired Samples E.g. Old Textbook 7.32 (now 7.39): • Confidence Intervals See Class Example 25, Part 3c https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg25.xls Margin of error = (same as above, but NORMINV TINV) So CI has endpoints:
Paired Sampling Caution This requires paired sampling, not “just 2 samples”. In particular, need different methods if have “just 2 samples of different bags” (marking of bags in pairs was critical) E.g. TTEST: other types… (not covered here, but be aware)
Paired Sampling CIs HW: 7.31 7.40 (a. For each student, randomly choose which is 1st b. H0: mu = 0 HA: mu > 0 c. 0.0039, strongly sign’t) 7.41 7.43
And now for something completely different … One day there was a fire in a wastebasket in the Dean's office and in rushed a physicist, a chemist, and a statistician…..
And now for something completely different … The physicist immediately starts to work on how much energy would have to be removed from the fire to stop the combustion. The chemist works on which reagent would have to be added to the fire to prevent oxidation.
And now for something completely different … While they are doing this, the statistician is setting fires to all the other wastebaskets in the office… "What are you doing?" they demanded.
And now for something completely different … "Well to solve the problem, obviously you need a large sample size" the statistician replies.
Inference for proportions Sec. 8.2: A deeper look (already know some basics, but there are some fine point worth a deeper look) Recall: Counts: Sample Proportions:
Inference for proportions Calculate prob’s with BINOMDIST, but note no BINOMINV, so instead use Normal Approximation Revisit Class Example 18 https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg18.xls
Inference for proportions Recall Normal Approximation to Binomial: For is approximately is approximately So use NORMINV (and often NORMDIST)
Inference for proportions Main problem: don’t know Solution: Depends on context: CIs or hypothesis tests Different from Normal, since mean and sd are linked, with both depending on , instead of separate .
Inference for proportions Case 1: Margin of Error and CIs: 95% 0.975 So:
Inference for proportions Case 1: Margin of Error and CIs: Continuing problem: Unknown Solution 1: “Best Guess” Replace by
Inference for proportions Solution 1: “Conservative” Idea: make sd (and thus m) as large as possible (makes no sense for Normal) zeros at 0 & 1 max at
Inference for proportions Solution 1: “Conservative” Can check by calculus so Thus
Inference for proportions Example: Old Text Problem 8.8 (now 8.10) Power companies spend time and money trimming trees to keep branches from falling on lines. Chemical treatment can stunt tree growth, but too much may kill the tree. In an experiment on 216 trees, 41 died. Give a 99% CI for the proportion expected to die from this treatment.
Inference for proportions Example: Old Text Problem 8.8 (now 8.10) Solution: Class example 26, part 1 https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg26.xls Note: Conservative much bigger (left end even < 0) Since Big gap So may pay substantial price for being “safe”
Inference for proportions HW: 8.1, 8.9 Do both best-guess and conservative CIs 8.13
Request for Review I was wondering if in class today you can go over the problem C16 a little more in depth. I am confused sometimes on how to choose the gray level, and when to dispute the claims. I thought we were suppose to say evidence is strong when pvalue is <.05 but it seemed that the answers given did not go by that standard. Just a little class time would help. Thank you, Lindsay Holdren
Hypothesis Testing HW: C16 For each of the problems: • A box label claims that on average boxes contain 40 oz. A random sample of 12 boxes shows on average 39 oz., with s = 2.2. Should we dispute the claim?
Hypothesis Testing • We know from long experience that Farmer A’s pigs average 570 lbs. A sample of 16 pigs from Farmer B averages 590 lbs, with an SD of 110. Is it safe to say B’s pigs are heavier on average? • Same as (b) except “lighter on average”. • Same as (b) except that B’s average is 630 lbs.
Hypothesis Testing Do: • Define the population mean of interest. • Formulate H+, H0, and H-, in terms of mu. • Give the P-values for both H+ and H-. (a. 0.942, 0.058, b. 0.234, 0.766, c. 0.234, 0.766, d. 0.015, 0.985) • Give a yes-no answer to the questions. (a. H- don’t dispute b. H- not safe c. H- not safe d. H- safe)
Hypothesis Testing • Give a gray level answer to the questions. (a. H- moderate evidence against b. H- no strong evidence c. H- seems to go other way d. H- strong evidence, almost very strong)
Review of C16 Note: this problem is about the preliminary H-, H0, H+ approach to hypothesis testing. • (i) Mean is average weight of all boxes made by company. (ii)
Review of C16 H- P-value = 0.942 very large, so no evidence for H-, either yes-no, or gray level. H+ P-value = 0.058 > 0.05, so “no strong evidence” by yes-no But only slightly above 0.05, so call this “moderately strong evidence” from gray level viewpoint