1 / 24

Statistics 200

Statistics 200. Objectives:. • Review hypothesis testing for a relationship between two categorical variables in a population. • Extend techniques beyond the 2x2 table case. • Predict how larger samples will influence chi-square statistic. Lecture #27 Tuesday, November 29 , 2016

marquittal
Download Presentation

Statistics 200

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics 200 Objectives: • Review hypothesis testing for a relationship between two categorical variables in a population. • Extend techniques beyond the 2x2 table case. • Predict how larger samples will influence chi-square statistic. Lecture #27 Tuesday, November 29, 2016 Textbook: 15.1

  2. Motivating Example: Have you ever missed class because of alcohol? 1. Have you ever missed class because you were drinking the night before? We asked a random sample of PSU students the following questions: What kind of variables are these? Quantitative Categorical 2. What year are you in school? a. Freshman b. Sophomore c. Junior d. Senior

  3. 2 Way Contingency Tables Summarizing responses: Generic notation: • (explanatory)×(response) table Example: (4 × 2) table implies: • explanatory variable: ___ levels • response variable: ____ levels 4 2

  4. Have you ever missed class because of alcohol? Which is the explanatory variable? Year in school. Missed class or not Student Response Variable: _______ answer k = _____ samples Samples are… Dependent Independent Y / N 4 4 2 ___ x ____ table

  5. Null Hypothesis: H0 Starting Position: __________ is happening Nothing Null: With (2 way Tables): In the ______________ there is: • ____ relationship between the two variables • ____ difference in the groups population No No Statistical Hypotheses: never include statements about ________ sample null Never trying to prove the ______ is true

  6. Alternative Hypothesis: Ha Something Challenging Position:__________ is happening data Goal: hope the _____ support the alternative Alternative: With (2 × way Tables): In the population, there is: _____ relationship between the two variables; _____ difference in the groups some some

  7. We can state hypotheses two ways Null Hypothesis: H0: In the population, no relationship between school year and the likelihood of missing class because of alcohol. H0: p1 = p2 = p3 = p4 Alternative Hypothesis: Ha: In the population, a relationship exists between school year and the likelihood of missing class because of alcohol Ha: at least one pi differs

  8. Do descriptive results show support for Ha? Row Percents Yes – those percents look different for the 4 groups

  9. We quantify evidence for the alternative using… • Next step: Find chi-square statistic using sample data • This statistic measures the difference in the __________ and ___________ counts in the contingency table. • Calculate using sample data Chi-Square statistic observed expected

  10. Expected count = 30 70 7.5 3.2 Chi-square Contribution = Get Chi-square statistic by summing contributions from all the cells

  11. Check the necessary conditions for using a chi-square test (page 603) 30 70 Are all expected counts at least 1? Are 80% or more of the expected counts at least 5? 7.5 3.2 If the answer is yes to both, we’re safe; the sample is large enough to make the test work well.

  12. How many df for the chi-Square statistic? The chi-square statistic follows a chi-square distribution with a specific number of degrees of freedom (df ). df = (r – 1)(c – 1) r = # row variable categories d = # column variable categories For our example, df = 1 B. 2 C. 3 D. 4

  13. Chi-Square Results Rows: SchoolYr Columns: Missed_Class Yes No All Fr 15 85 100 30 70 7.500 3.214 So 25 75 100 30 70 0.833 0.357 Jr 30 70 100 30 70 0.000 0.000 Se 50 50 100 30 70 13.333 5.714 All 120 280 400 Cell Contents: Count Expected count Contribution to Chi-square Pearson Chi-Square = 30.952, DF = 3, P-Value = 0.000 df= (r – 1)x (c – 1)

  14. Analysis & Conclusion H0: p1 = p2 = p3 = p4 Ha: at least one pi differs Chi-Sq = 30.952 & p-value = 0.000 Conclusions: We can claim that • a relationship exists between school year and the likelihood of having missed class because of alcohol in the population. • at least _____ population proportion of yeses _________ when comparing the four school years one differs

  15. New Example: • Was PSU your first choice? * no * yes • How many college applications did you submit? * (1-2) * (3-5) * (6-8) * (≥ 9)

  16. Example 2: Two Different Categorical Variables Variable 4 levels Not a binomial Identify Variables: • explanatory: PSU first choice • response: number of applications submitted This is an example of a ___ x ___ table 2 4

  17. Row percentsfor each level of explanatory variable Descriptively: what do the row percents suggest? They suggest a difference

  18. State Hypotheses Null Hypothesis: Ho: no relationship between PSU choice and number of college applications submitted in the population H0: distributions are _____ the same for the two PSU choices both Alternative Hypothesis: Ha: relationship exists between PSU choice and number of college applications submitted in the population Ha: distributions are ___ ____the same for two the PSU choices not both

  19. Minitab Output:Stat > Tables > Cross tabulation & Chi-square Is there a statistically significant relationship found? ________ _________ distributions are found Rows: PSU_Choice Columns: College_Applied (1-2) (3-5) (6-8) (=>9) All No 12 60 29 19 120 33.60 56.40 21.00 9.00 Yes 100 128 41 11 280 78.40 131.60 49.00 21.00 All 112 188 70 30 400 Cell Contents: Count Expected count Pearson Chi-Square = 40.392, DF = 3, P-Value = 0.000 Yes! Different

  20. Example 2: Two Different Categorical Variables Pearson chi-square = 40.392 Double all counts to get: Pearson chi-square = 80.784, which is exactly doubled!

  21. Yet another pair of categorical variables: How satisfied are you with your overall appearance? •(very) •(somewhat) •(not much) What do you best like to wear when going to and from classes? •shorts • athletic wear • jeans • other Satisfaction Explanatory Variable: Response Variable: Table: __ x __ Which clothes 3 4

  22. What does the pattern of row percents show us? Very similar? Very different? Look for significance using Chi-square statistic to decide!

  23. Inferentially no support for a relationship in the population H0: No relationship, or no difference among the three distributions. Ha: Relationship exists, or some difference exists among the three distributions. Pearson Chi-Square = 6.34, DF = 6, P-Value = 0.386 NOT Can _____ claim that there is a statistically significant relationship or that different distributions are found We cannot rule out randomness as a plausible explanation for what has happened.

  24. If you understand today’s lecture… 15.1, 15.3, 15.11, 15.17, 15.19, 15.47, 15.49, 15.59 Objectives: • Review hypothesis testing for a relationship between two categorical variables in a population. • Extend techniques beyond the 2x2 table case. • Predict how larger samples will influence chi-square statistic.

More Related