230 likes | 254 Views
S519: Evaluation of Information Systems. Social Statistics Inferential Statistics Chapter 15: Chi-square. Last week. Linear regression Slope Intercept. This week. What is chi-square CHIDIST Non-parameteric statistics. Parametric statistics. A main branch of statistics
E N D
S519: Evaluation of Information Systems Social Statistics Inferential Statistics Chapter 15: Chi-square
Last week • Linear regression • Slope • Intercept
This week • What is chi-square • CHIDIST • Non-parameteric statistics
Parametric statistics • A main branch of statistics • Assuming data with a type of probability distribution (e.g. normal distribution) • Making inferences about the parameters of the distribution (e.g. sample size, factors in the test) • Assumption: the sample is large enough to represent the population (e.g. sample size around 30). • They are not distribution-free (they require a probability distribution)
Nonparametric statistics • Nonparametric statistics (distribution-free statistics) • Do not rely on assumptions that the data are drawn from a given probability distribution (data model is not specified). • It was widely used for studying populations that take on a ranked order (e.g. movie reviews from one to four stars, opinions about hotel ranking). Fits for ordinal data. • It makes less assumption. Therefore it can be applied in situations where less is known about the application. • It might require to draw conclusion on a larger sample size with the same degree of confidence comparing with parametric statistics.
Nonparametric statistics • Nonparametric statistics (distribution-free statistics) • Data with frequencies or percentage • Number of kids in difference grades • The percentage of people receiving social security
One-sample chi-square • One-sample chi-square includes only one dimension • Whether the number of respondents is equally distributed across all levels of education. • Whether the voting for the school voucher has a pattern of preference. • Two-sample chi-square includes two dimensions • Whether preference for the school voucher is independent of political party affiliation and gender
Compute chi-square One-sample chi-square test O: the observed frequency E: the expected frequency
Example Question: Whether the number of respondents is equally distributed across all opinions One-sample chi-square
Chi-square steps • Step1: a statement of null and research hypothesis There is no difference in the frequency or proportion in each category There is difference in the frequency or proportion in each category
Chi-square steps • Step2: setting the level of risk (or the level of significance or Type I error) associated with the null hypothesis • 0.05
Chi-square steps • Step3: selection of proper test statistic • Frequencynonparametric procedureschi-square
Chi-square steps • Step4. Computation of the test statistic value (called the obtained value)
Chi-square steps • Step5: Determination of the value needed for rejection of the null hypothesis using the appropriate table of critical values for the particular statistic • Table B5 • df=r-1 (r= number of categories) • If the obtained value > the critical value reject the null hypothesis • If the obtained value < the critical value accept the null hypothesis
Chi-square steps • Step6: a comparison of the obtained value and the critical value is made • 20.6 and 5.99
Chi-square steps • Step 7 and 8: decision time • What is your conclusion, why and how to interpret?
Another example • We’ll settle the age-old debate of whether people can actually detect their favorite cola based solely on taste. For 30 coke-lovers, I blindfold them, and have them sample 3 colas…is there a true difference, or are these preference differences explainable by chance?
Hypothesis • Null: There are no preferences: The population is divided evenly among the brands • Alternate: There are preferences: The population is not divided evenly among the brands
Chance Model • df = C -1 = 3 -1 = 2, set α = .05 • For df = 2, X2-crit = 5.99
Decision and Conclusion • Conclude that the preferences are evenly divided among the colas when the logos are removed.
Excel functions • CHIDIST (x, degree of freedom) • CHIDIST(20.6,2) • 3.36331E-05<0.05 • CHIDIST(1.40,2) • 0.496585308>0.05
More non parametric statistics • Table 15.1 (P297)