370 likes | 478 Views
Welcome to BUAD 310. Instructor: Kam Hamidieh Lecture 16, Monday March 24, 2014. Agenda & Announcement. Today: Chapter 17 You’ll have to read a few slides (27+) and parts of the chapter on your own. Reading: All of Chapter 17
E N D
Welcome to BUAD 310 Instructor: Kam Hamidieh Lecture 16, Monday March 24, 2014
Agenda & Announcement • Today: • Chapter 17 • You’ll have to read a few slides (27+) and parts of the chapter on your own. • Reading: All of Chapter 17 • Note: Homework 4 is due this Wednesday, March 26, at 5 PM. No extensions will be given. • Next Time: Chapter 18! BUAD 310 - Kam Hamidieh
From Last Time • Hypothesis testing with population mean μ: • Hypothesis testing with population proportion p: • The logic: assume H0 true, compute test stat, p-value, check for compatibility of data with H0 • Note, you can also create confidence intervals for μ and p. BUAD 310 - Kam Hamidieh
From Last Time • Types of errors: • Type I = H0 true but you reject H0 • Type II = Ha true but you fail to reject H0 • α is our max tolerance level for probability of making type I error. • Final practical decisions must consider: • Consequences of the possible errors • Practical versus statistical significance BUAD 310 - Kam Hamidieh
About P-Value • P-Value does not tells us the probability of null hypothesis being true. • P-Value does not tells us the probability of null hypothesis being true. • P-Value is computed by assuming that null is true. • If p-value is low then this means that your results are not consistent with assuming null is true. • When it comes to decision time we use the cut off α. • “Smaller p-values provide stronger evidence against the null.” BUAD 310 - Kam Hamidieh
Not a fan of P-Values? You’re not alone! • A scathing criticism of hypothesis testing and p-values:http://www.phil.vt.edu/dmayo/personal_website/Schmidt_Hunter_Eight_Common_But_False_Objections.pdf • A more balanced view:http://www.ncbi.nlm.nih.gov/pubmed/10937333 BUAD 310 - Kam Hamidieh
Today…the same! • We will have a new parameter: • Test statistics BUAD 310 - Kam Hamidieh
Today…same thing! • The confidence intervals will have the form: Point Estimate ± (margin of error) BUAD 310 - Kam Hamidieh
Example, Two Sample t-test A marketing team designed a promotional web page to increase online sales. Out of the total 100 visitors to the company’s website, 55 were assigned to the old page and 45 to the new page. The assignments were done at random. We have the following sales data in dollars: Does the new page generate (statistically) significant higher sales? Let α = 5%. BUAD 310 - Kam Hamidieh
Example, Two Sample t-test All the possible spending amounts at new site μnew = mean customer spending, new site All the possible spending amounts at old site μold = mean customer spending, old site Independent random Samples Random sample from new site: new = sample mean sales from old new Random sample from old site: old = sample mean sales from old site new- old = Point estimate of μnew- μold BUAD 310 - Kam Hamidieh
Example (Continued) Define:μold = mean customer spending at the old siteμnew= mean customer spending at the new site We want to determine whether the means of populations represented by two independent samples of a quantitative variables (amount spent) differ. We can form our hypotheses as follows: H0: μnew -μold ≤ 0 (μnew≤μold) Ha: μnew - μold> 0 (μnew>μold) BUAD 310 - Kam Hamidieh
Example (Continued) The test-statistics: The distribution of the test statistics is approximately t distribution with df determined by software (or if done by hand, use smaller of (nnew -1 , nold – 1) = min(44,54)=44.) BUAD 310 - Kam Hamidieh
Example (YouTube Videos will be posted.) StatCrunch: BUAD 310 - Kam Hamidieh
Example (Continued) From software:P-Value ≈ 0.0042 We reject the null hypothesis since p-value is less than α. We have sufficient evidence from our data that the new site increases the mean sales. BUAD 310 - Kam Hamidieh
In the Business World… An interesting article: http://www.wired.com/business/2012/04/ff_abtesting Look here too:https://www.optimizely.com/ BUAD 310 - Kam Hamidieh
Two Sample Test in General Suppose a random sample of size n1 is drawn from a normal population with mean μ1, and an independent random sample of size n2 is drawn from a normal population with mean μ2. To perform your hypothesis test with null value D0, compute the two sample t statistics: and use p-values for a t distribution where the df is approximated by the software(or is by hand use smaller of n1 -1 and n2 – 1) BUAD 310 - Kam Hamidieh
Some Comment • Theory tells us that the random sample must be from normal populations. However, we will stick with the guidelines from Slide 25, Lecture 13. • You can always use graphical tools such as boxplots and histograms to check your data. • In general, the independence condition is met as long as subjects or items were randomly assigned to two groups. BUAD 310 - Kam Hamidieh
CI for Difference of Two Means Under the same conditions as the two sample t-test (see slide 15), (1-α)100% confidence interval for μ1 – μ2 is: where tα/2 comes from a t distribution with df determined by the software (or if done by hand use smaller of n1 -1 and n2 – 1.) BUAD 310 - Kam Hamidieh
CI for Previous Example From StatCrunch, 95% CI (new – old) after some rounding is: (21, 139) We are 95% confident that the mean spending on the new site will be $21 to $139 more than the old site. (What would you get if you do this by hand?) BUAD 310 - Kam Hamidieh
In Class Exercise 1 Exposure to dust at work can lead to lung disease later in life. One study – this was an actual study! – measured the workplace exposure of two tunnel construction workers: Indoor workers vs. Outdoor workers. The subjects were chosen at random. The data is summarized below: The measurement units are in milligram years per cubic meter. Use t-table: since we are doing this by hand, use df = 100. If you have your laptops, try using Stats Crunch. • Create a 95% confidence interval for the difference in the mean exposures. • Conduct a hypothesis test to see if the exposure for these do groups differ. (Use α = 5%) • Comments? BUAD 310 - Kam Hamidieh
Relationship between CI and Testing • When dealing with mean (or means), a two-sided test at level α can be carried out directly from a confidence interval at (1-α)100%. • More specifically, given: H0: μ = μ0 vs. Ha: μ≠ μ0Reject H0 at α level ↔ μ0not in (1- α)100% CIFail to Reject H0 at α level ↔ μ0in (1- α)100% CI • Look at previous problem! BUAD 310 - Kam Hamidieh
Examples The p-value for a two sided test of H0: μ = 30 is 0.033: • If α= 5%, does the 95% CI include the value 30? No! Since we would reject H0 => 30 not in 95% CI. • If α= 1%, does the 99% CI include the value 30? Yes! Since we fail to reject H0 => 30 in 99% CI. BUAD 310 - Kam Hamidieh
In Class Exercise 2 A 95% CI for a population mean is (53, 62). • Can you reject H0: μ = 58 at α = 5% against a two sided hypothesis? Explain. • Can you reject H0: μ = 63 at α = 5% against a two sided hypothesis? Explain. BUAD 310 - Kam Hamidieh
Some Terminology & Concepts • Think of the webpage example as an “experiment”. • You can think of the group that were sent to the new site as the treatment group. • The group that were sent to old site are called the control group. • We are comparing the treatment and control groups. • Randomization: This is the process by which subjects (or things in our study) get assigned to different groups. BUAD 310 - Kam Hamidieh
Why Randomize? • Goal: make the two groups as similar as possible. • If the groups are similar in all aspects, and if we detect a difference based on our inference, then we can say that the difference is due to the condition we applied. BUAD 310 - Kam Hamidieh
Randomization • Suppose we had not randomized in our example of webpage sales. • See any problems with this? • You pick your 45 subjects for the new site on Saturday. • Then you pick your 55 subjects for the old site on Wednesday. • The day of the week is a lurking variable! It is a variable you have not thought about but can explain the difference between the groups or variables. BUAD 310 - Kam Hamidieh
Example of Paired T-Test BaseLine Final 1 159 194 2 93 122 3 130 158 4 174 154 5 148 93 6 148 90 7 85 101 8 180 99 9 92 183 10 89 82 11 204 100 12 182 104 13 110 72 14 88 108 15 134 110 16 84 81 The data for this example come from a study of a diet. The diet consists mostly of protein and animal fats, restricting the carbohydrate intake. Triglyceride values (mg/100 ml) are given for the male participants both before the diet and at end of a period of time following the diet. High levels of triglycerides have been linked to many diseases. We think that this diet will reduce the triglyceride levels. Set α = 0.05. BUAD 310 - Kam Hamidieh
Example of Paired Data Baseline People Final Differences 159 194 35 We are interested in the differences! 93 122 29 84 81 -3 BUAD 310 - Kam Hamidieh
Paired T-Tests • The term paired data means that the data have been observed in pairs. • You have two sets of data but they are dependent. • Generally you are interested in the differences between the two sets of data. • To perform inference, you just get the differences in two groups and perform a t-test on them. It is that simple! • The assumptions are the same for the one sample t-tests. BUAD 310 - Kam Hamidieh
Example Continued • Define the population parameter of interest: d = population mean difference in the Triglyceride level of the Diet participants (final – baseline) • Our Hypotheses: H0: d≥ 0 Ha: d < 0 • Note the use of subscript d to emphasize difference. BUAD 310 - Kam Hamidieh
Example Continued Sample mean of the difference: is a point estimate for population mean difference and likewise for μfinal - μbase = μd Sample sd of the differences: BUAD 310 - Kam Hamidieh
Example • Our test statistic in this case: • If the null hypothesis were true, our observed value is -1.20 standard error units below the null or the hypothesized value of 0. BUAD 310 - Kam Hamidieh
Example • Under the H0, the test statistic has a t-distribution with df= 16 – 1 = 15. We write T ~ t, df=15. • We have: P-Value = P(T ≤ -1.20). • The exact software value is 0.1244. (Done this many times!) BUAD 310 - Kam Hamidieh
Example • Our p-value is large (>0.05) at 0.1244. • So we fail to reject the null hypothesis. We do not have a statistically significant result. • The data give no evidence that the diet has an effect on the mean Triglyceride levels. BUAD 310 - Kam Hamidieh
Confidence Intervals • The (1-α)100% confidence intervals will be just like the confidence intervals for the one sample t tests: • Here: (Note:2.131 was obtained from StatCrunch but t table gives the same value) Zero in this interval; the difference is not statistically significant. BUAD 310 - Kam Hamidieh
Some Comments • Again, nothing new: just perform a one sample t test on the differences. • Could we have performed a two-sample t test? Yes, but need to make sure the two groups are independent and similar. • Sometimes paired t-tests are not possible: testing to see if a new additive to concrete mixture improves maximum load. BUAD 310 - Kam Hamidieh
In Class Exercise 3 Suppose you want to compare prices in two websites. For concreteness, let’s assume you want to compare textbook prices: amazon.com vs. bn.com. Clearly (?) you can’t obtain a list of all the items and make a full comparison. Assume that you can obtain a random selection of the same items from the two websites. How would you go about making the comparison? BUAD 310 - Kam Hamidieh