1 / 49

Two-Sample Problems – Means

Two-Sample Problems – Means. Comparing two (unpaired) populations Assume: 2 SRSs, independent samples, Normal populations. Make an inference for their difference:. Sample from population 1:. Sample from population 2:. S.E. – standard error in the two-sample process. Confidence Interval:.

kosey
Download Presentation

Two-Sample Problems – Means

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two-Sample Problems – Means • Comparing two (unpaired) populations • Assume: 2 SRSs, independent samples, Normal populations • Make an inference for their difference: • Sample from population 1: • Sample from population 2:

  2. S.E. – standard error in the two-sample process • Confidence Interval: • Estimate ± margin of error • Significance Test:

  3. Using the Calculator • Confidence Interval: • On calculator: STAT, TESTS, 0:2-SampTInt… • Given data, need to enter: Lists locations, C-Level • Given stats, need to enter, for each sample: x, s, n • and then C-Level • Select input (Data or Stats), enter appropriate info, then Calculate

  4. Using the Calculator • Significance Test: • On calculator: STAT, TESTS, 4:2-SampT –Test… • Given data, need to enter: Lists locations, Ha • Given stats, need to enter, for each sample : x, s, n • and then Ha • Select input (Data or Stats), enter appropriate info, then Calculate or Draw • Output: Test stat, p-value

  5. Model 1: Ex 1. Is one model of camp stove any different at boiling water than another at the 5% significance level? • Model 2:

  6. Children: Ex 2. Is there evidence that children get more REM sleep than adults at the 1% significance level? • Adults:

  7. Iris virginica: Ex 3. Create a 98% C.I for estimating the mean difference in petal lengths (in cm) for two species of iris. • Iris setosa:

  8. Iris virginica: Ex 4. Is one species of iris any different at petal length than another at the 2% significance level? • Iris setosa: 0 2 3 -3 -4 -1 4 -2 1

  9. Two-Sample Problems – Proportions • Make an inference for their difference: • Sample from population 1: • Sample from population 2:

  10. Using the Calculator • Confidence Interval: • Estimate ± margin of error • On calculator: STAT, TESTS, B:2-PropZInt… • Need to enter: C-Level • Enter appropriate info, then Calculate.

  11. Using the Calculator • Significance Test: • On calculator: STAT, TESTS, 6:2-PropZTest… • Need to enter: and then Ha • Enter appropriate info, then Calculate or Draw • Output: Test stat, p-value

  12. Nesting boxes apart/hidden: Ex 5. Create a 95% C.I for the difference in proportions of eggs hatched. • Nesting boxes close/visible:

  13. Ex 6. Split 1100 potential voters into two groups, those who get a reminder to register and those who do not. Of the 600 who got reminders, 332 registered. Of the 500 who got no reminders, 248 registered. Is there evidence at the 1% significance level that the proportion of potential voters who registered was greater than in the group that received reminders? • Group 1: • Group 2:

  14. Ex 6. (continued)

  15. Ex 7. “Can people be trusted?” Among 250 18-25 year olds, 45 said “yes”. Among 280 35-45 year olds, 72 said “yes”. Does this indicate that the proportion of trusting people is higher in the older population? Use a significance level of α = .05. • Group 1: • Group 2:

  16. Ex 7. (continued)

  17. Scatterplots & Correlation Each individual in the population/sample will have two characteristics looked at, instead of one. Goal: able to make accurate predictions for one variable in terms of another variable based on a data set of paired values.

  18. Variables • Explanatory (independent) variable, x, is used to • predict a response. • Response (dependent) variable, y, will be the • outcome from a study or experiment. • height vs. weight, • age vs. memory, • temperature vs. sales

  19. Scatterplots • Plot of paired values helps to determine if a relationship exists. • Ex: variables – height(in), weight (lb) 190 170 150 65 66 70 72 68

  20. Scatterplots - Features • Direction: negative, positive • Form: line, parabola, wave(sine) • Strength: how close to following a pattern • Direction: 190 • Form: 170 • Strength: 150 65 66 70 72

  21. Scatterplots – Temp vs Oil used • Direction: 45 • Form: 35 • Strength: 25 20 30 70 90

  22. Correlation • Correlation, r, measures the strength of the linear relationship between two variables. • r > 0: positive direction • r < 0: negative direction • Close to +1: • Close to -1: • Close to 0:

  23. .85, -.02, .13, -.79

  24. Lines - Review • y = a + bx 3 • a: 2 1 1 2 • b: 3 4

  25. Regression Looking at a scatterplot, if form seems linear, then use a linear model or regression line to describe how a response variable y changes as an explanatory variable changes. Regression models are often used to predict the value of a response variable for a given explanatory variable.

  26. Least-Squares Regression Line • The line that best fits the data: • where:

  27. Example • Fat and calories for 11 fast food chicken sandwiches • Fat: • Calories:

  28. Example • Fat and calories for 11 fast food chicken sandwiches • Fat: • Calories: Calories Fat

  29. Example-continued • What is the slope and what does it mean? • What is the intercept and what does it mean? • How many calories would you predict a sandwich with 40 grams of fat has?

  30. Why “Least-squares”? • The least-squares lines is the line that minimizes • the sum of the squared residuals. • Residual: difference between actual and predicted 27 18 9 1 3

  31. Scatterplots – Residuals • To double-check the appropriateness of using a linear regression model, plot residuals against the explanatory variable. • No unusual patterns means good linear relationship.

  32. Other things to look for • Squared correlation, r2, give the percent of variation explained by the regression line. • Chicken data:

  33. Other things to look for • Influential observations: • Prediction vs. Causation: • x and y are linked (associated) somehow but • we don’t say “x causes y to occur”. Other forces may be causing the relationship (lurking variables).

  34. Extrapolation: using the regression for a prediction outside of the range of values for the explanatory variables.

  35. On calculator • Set up: 2nd 0(catalog), x-1(D), scroll down to • “Diagnostic On”, Enter, Enter • Scatterplots: 2nd Y=(Stat Plot), 1, On, Select Type • And list locations for x values and y values • Then, ZOOM, 9(Zoom Stat) • Regression: STAT, CALC, 8: LinReg (a + bx), enter, • List location for x, list location for y, enter • Graph: Y=, enter line into Y1

  36. Examples:

  37. Contingency Tables Making comparisons between two categorical variables • Contingency tables summarize all outcomes • Row variable: one row for each possible value • Column variable: one column for each possible value • Each cell (i,j) describes number of individuals with those values for the respective variables.

  38. Info from the table • # who are over 25 and make under $15,000: • % who are over 25 and make under $15,000: • % who are over 25: • % of the over 25 who make under $15,000:

  39. Marginal Distributions • Look to margins of tables for individual variable’s distribution • Marginal distribution for age: • Marginal distribution for income:

  40. Conditional Distributions • Look at one variable’s distribution given another • How does income vary over the different age groups? • Consider each age group as a separate population and compute relative frequencies:

  41. Independence Revisited • Two variables are independent if knowledge of one • does not affect the chances of the other. • In terms of contingency tables, this means that the • conditional distribution of one variable is (almost) the • same for all values of the other variable. • In the age/income example, the conditionals are not • even close. These variables are not independent. • There is some association between age and income.

  42. Test for Independence Is there an association between two variable? • H0: The variables are ( The two variables are ) • Ha: The variables (The two variables are ) Assuming independence: • Expected number in each cell (i, j): • (% of value i for variable 1)x(% of j value for variable 2)x (sample size) =

  43. Example of Computing Expected Values Expected number in cell (A, +):

  44. Chi-square statistic To measure the difference between the observed table and the expected table, we use the chi-square test statistic: where the summation occurs for each cell in the table. Skewed right df = (r – 1)(c – 1) Right-tailed test

  45. Test for Independence – Steps • State variables being tested • State hypotheses: H0, the null hypothesis, vars independent • Ha, the alternative, vars not independent • Compute test statistic: if the null hypothesis is true, where • does the sample fall? Test stat = X2-score • Compute p-value: what is the probability of seeing a test stat • as extreme (or more extreme) as that? • Conclusion: small p-values lead to strong evidence against H0.

  46. ST – on the calculator • On calculator: STAT, TESTS, C:X2 –Test • Observed: [A] • Expected: [B] • Enter observed info into matrix A, then perform test with Calculate or Draw. • To enter observed info into matrix A: • 2nd, x-1 (Matrix), EDIT, 1: A, change dimensions, enter info in each cell. • Output: Test stat, p-value, df

  47. Ex . Test whether type and rh factor are independent at a 5% significance level.

  48. Ex . Test whether age and stance on marijuana legalization are associated.

  49. Additional Examples

More Related