540 likes | 654 Views
Action Research Review. INFO 515 Glenn Booker. Why do we do this?. Measurements are needed to understand a system, and predict its future behavior Statistical techniques provide a commonly accepted means of analyzing measurements
E N D
Action ResearchReview INFO 515 Glenn Booker Lecture #10
Why do we do this? • Measurements are needed to understand a system, and predict its future behavior • Statistical techniques provide a commonly accepted means of analyzing measurements • Statistics is based on recognizing that measurements tend to fall over a range of values, not just one precise number Lecture #10
Historical (what happened?) Descriptive (what is happening?) Developmental (over time) Case and Field (study an organization) Correlational (does A affect B?) Causal Comparative (what caused it) True Experimental (single / double blind) Quasi-Experimental Action Research Types of Research Lecture #10
Data Analysis • Raw data, such as one survey result • Refined data, such as the distribution of ages of Philadelphia residents • Derived data, such as comparing the age distribution of Philadelphia residents to that of the country Lecture #10
Population vs. Sample • Often the subject of interest (population) is so big it isn’t feasible to measure it all • Then a sample of measurements can be made, and we want to relate the sample measurement to the population Lecture #10
Sampling • Sampling can be done using probabilistic techniques (e.g. various random samples) • Simple or stratified random, • Cluster (geographic), or • Systematic (every Nth) samples • Or using non-probabilistic methods (whoever’s convenient, specific groups, or experts) Lecture #10
Customer Satisfaction Surveys • A special case of sampling, customer satisfaction surveys are often done using: • In person interview • Telephone interview • Questionnaire by mail • Sample sizes are based on the allowable error, population size, and the result obtained Lecture #10
Measurement Scales • Measurements can use four major types of scales; the types of analysis possible depend strongly on the type of measurements used • Nominal (named buckets, without sequence) • Ordinal (ordered buckets) • Interval (intervals mean something, can +-) • Ratio (you can form ratios, can +-*/ ) Lecture #10
Discrete versus Continuous • Discrete (nonparametric) measurements use nominal or ordinal scales; only specific values are allowed • Car make = Chevy, or cost = High • Continuous (parametric) measurements use interval or ratio scales, and generally have integer or real number values • Temperature = 98.6 deg F, Height = 172.1 cm Lecture #10
Descriptive Statistics • Many common statistics can describe the central tendency of a set of measurements • Average (arithmetic mean) • Minimum, Maximum, Range • Median (middle value) • Mode (most common value) Lecture #10
Normal Distribution • Many measurements can be described by a “normal” distribution, which is summarized by an average value and a standard deviation, s or s • We can predict how likely any range of values is to occur for a normal distribution (how often is X between 5 and 8?) Lecture #10
Z Score • Z scores measure how far from the mean a single measurement isz = (Xi - m) / s • Same formula used for finding “t” too • Does not only apply to a normal distribution, but if it does, then we can predict the probability of that value or higher/lower occurring Lecture #10
Standard Error • A sample of N measurements will have a standard error SEx = s / sqrt(N) • The standard error allows us to define the confidence interval, CICI = mean +/- crit*SExwhere “crit” is the critical z score for a large sample, or the critical t score for a small sample Lecture #10
Critical z and t • The critical z score is only a function of the desired confidence level of the results (zc = 1.96 for 95% confidence level) • Critical t score is a function of the sample size (degrees of freedom, df = n-1) and the desired confidence level • As df gets very large, critical t critical z Lecture #10
Confidence Level • We have to accept some level of uncertainty in a statistical analysis – our conclusion might be wrong! • Generally, a 95% level of confidence is used, unless life is on the line - then a 99% level of confidence is required • Use 95% typically, hence critical significance is 0.050 Lecture #10
Confidence Level • The level of confidence of your results, plus the critical significance, always equals exactly one • For practically every statistical test, having the Significance of the result less than the critical value means to reject the null hypothesis • If Sig actual < Sig crit, reject null hypothesis Lecture #10
Frequency and Percentage • Frequency graphs and crosstabs can provide a lot of information just from counts of a nominal or ordinal measurement occurring, possibly given with the percentages of each event’s occurrence • Histograms can provide similar charts for ratio or interval scaled data Lecture #10
Scatterplots • Scatter plots or diagrams show the relationship between two or more measures • The horizontal axis is generally the independent variable (X), sometimes also called a factor or grouping variable • The vertical axis is generally the dependent variable (Y), which is the measure you’re trying to understand Lecture #10
Hypothesis Testing • Some statistics are used in the context of testing a hypothesis - a statement whose truth you wish to determine • Are Philadelphians more likely to be Nobel Prize winners? • The Null hypothesis is the opposite of the hypothesis, and generally says there is no difference or no effect observed • Philadelphians no more likely to be Nobel Prize winners than any other group Lecture #10
Hypothesis Testing • Can’t truly PROVE anything - only determine if the differences observed are “not likely to be due to chance” • Select one or more “Tests of Significance” to determine if there is a statistically significant difference (Yes/No); if Yes, then can • Select one or more “Measures of Association” to describe the strength of the difference, and possibly its direction Lecture #10
One versus Two Tailed Tests • A null hypothesis which tests for “no difference” uses a two tailed test • A null hypothesis which specifically tests for “greater than” uses a one tailed test • A null hypothesis which specifically tests for “less than” uses a one tailed test • One versus two tailed changes the critical z or t score; generally makes the test easier to show significance – that’s why two-tailed tests are used Lecture #10
Z or T Test • The z or t tests can be used to compare two distribution means, or compare one distribution mean to a fixed value (interval or ratio data) • Compare the actual z or t score to the critical z or t score • If the actual z or t score is closer to zero than the critical value, accept the null hypothesis Lecture #10
Z or T Test (Two Tailed) Notice this is for the x or t value, NOT the significance of that value Lecture #10
Z or T Test (One Tailed) (Case here is testing if the actual value is greater than the mean; for a “less than” case, use only the negative critical value.) Lecture #10
Is My Sample Normal? • Boxplots and stem-and-leaf diagrams can help show graphically whether a sample has a fairly normal distribution • The skewness and kurtosis of a data set can help identify non-normality, if their values are more than two times their own standard errors Lecture #10
T Tests • T tests compare means for ratio or interval data • Independent t test is for two different strata within one data set • Paired t test is to compare measures of the same group before and after some event (drug test), or the samples are otherwise believed to be dependent on each other • One-sample t test compares one sample to a fixed value Lecture #10
T Tests • Null hypothesis is that there is no difference between the means • Results (e.g. significance) may differ if variances are not equal, since df changes • The Levene test checks for equal variances • Null hypothesis for the Levene test is that the variances are equal • If the Levene significance < 0.050, variances are not equal (reject the null hypothesis) Lecture #10
Independent T Test Evaluation • Three ways to check the results of a T test • If the T test’s significance < 0.050, reject the null hypothesis • Check the stated t value against the critical t value for this ‘df’ level; if t(actual) > t(critical) reject the null hypothesis • If the confidence interval for the difference between the means does not include zero, reject the null hypothesis Lecture #10
Evaluating Significance Lecture #10
Paired T Test Evaluation • Checks before and after test cases • Includes a correlation factor (like ‘r’) • Can use paired test if significance < 0.050 • Larger correlation factor means stronger relationship between the variables • Test evaluation as Independent T Test • Significance, ‘t’ value, and confidence interval Lecture #10
One-Sample T Test • Compare a sample mean to a fixed value • Test shows the actual values of means, with their std deviation and std error • Same interpretation of results • Significance, ‘t’ value, and confidence interval Lecture #10
F Test and ANOVA • Compare several means against each other using Analysis of Variance (ANOVA) and the F test • Like extending the T tests to many variables • Want data from random samples of normal populations with equal variances Lecture #10
F Test and ANOVA • Output includes the Levene test • Want significance for Levene > 0.050, so that equal variances can be assumed • Otherwise, should not use ANOVA • Evaluate F by its significance • If Sig. < 0.050, reject the null hypothesis (there is a significant difference among the means) Lecture #10
Additional ANOVA Tests • Once the F test shows there is some difference in the means across a subset, additional ANOVA tests can help identify more specific trends and differences • Types of tests (see end of lecture 6) include • Pairwise Multiple Comparisons • Post Hoc Range Tests Lecture #10
Pairwise Multiple Comparisons • Pairwise Multiple Comparisons check two subsets of data at a time • Bonferroni test is better for a small number of subsets • Tukey test is better for many subsets • Both assume subset variances are equal • For each pair of subset values, Sig < 0.050 means the difference in means is significant Lecture #10
Post Hoc Range Tests • Post Hoc Range Tests look for groups within each subset which all have similar variances • Tukey and Tukey’s-b tests include Post Hoc Range Tests • Each column of the output is a subset with statistically similar means • Subsets may overlap substantially Lecture #10
Contrasts Across Means • Look across subset means to see if there is a trend, such as a linear increase or decrease across subsets • Can check for Linear, Quadratic, or Cubic relationships • (i.e. first, second, or third order polynomials) • Check Significance of F for the Unweighted version of each relationship (Linear, etc.) if Sig. < 0.050, reject the null hypothesis Lecture #10
Determine Linearity • An option under Compare Means / Means allows checking just for linearity • This confirms the ANOVA test result for Linearity • And gives R and Eta parameters, which are Measures of Association Lecture #10
R and Eta • Pearson’s R * measures how well the data fits the regression (-1 is a perfect negative correlation, 0 is no relationship, 1 is perfect positive correlation), and describes the amount of shared variance between them • Eta squared gives how much of the variance in one variable is caused by the changes in the other variable * Named for English statistician Karl Pearson, 1857-1936 (per http://human-nature.com/nibbs/03/kpearson.html) Lecture #10
Regression Analysis • Regression Analysis looks at two interval or ratio-scaled variables (generically X and Y) and tries to fit an equation between them • A dozen different equations are available • Linear, Power, Logarithmic, Exponential, etc. • Significance is checked by ANOVA F, and Sig. of the regression coefficients; association is measured with R Squared Lecture #10
Regression Analysis • For a regression to have any significance, we must have ANOVA’s Sig. F < 0.050 • Then each variable’s coefficient (b0, b1, etc.) must have significance < 0.050 • Otherwise the coefficient might be zero • Then the better regression equations are ranked in order of strength by R Square, which is confirmed visually by plotting Lecture #10
Regression Analysis • The standard error of coefficients is given, so confidence intervals can be formed • Also helps report them meaningfully, so you don’t report a value as 4.861435 if it has a standard error of 0.92 • Depending on the accuracy of the source data, you could report that result as 5 +/- 1, or 4.9 +/- 0.9, or 4.86 +/- 0.92 Lecture #10
Crosstabs • Crosstabs display data sorted by two or more variables in table form • Often just counts of each category, and/or the percentage of counts • Recoding data allows interval or ratio scale data to be put into groups (e.g. age 18-25) Lecture #10
Pearson’s Chi Square • Measures how well the actual (observed) data differs from a even (expected) distribution of data • The “expected” data can be a random distribution (same number of counts per cell), or adjusted for the actual total counts for each row and column Lecture #10
Pearson’s Chi Square Evaluation • When chi square is larger than the critical value, reject the null hypothesis • Or if the significance of chi square is < 0.050, reject the null hypothesis • Can also generate Chi square for a single variable • Beware that Chi square is less meaningful for large matrices • Or, it’s too easy for large matrices to show significance falsely using Chi square Lecture #10
Residuals • A residual is the difference between the Observed and Estimated values for a cell • Residuals can be plotted to look for outliers • Residuals can be standardized by dividing by their standard deviation • Cells with a standardized residual magnitude > 2 contribute a lot to Chi square Lecture #10
Measures of Association • Measures of Association between two variables can be symmetric or directional • Dozens of measures have been developed to work with chi square test • Interpret them like ‘r’ - zero means no correlation, larger values mean a stronger correlation • Some can be > 1 Lecture #10
Measures of Association • Symmetric measures don’t care which variable is dependent (Y) • Directional measures DO care which variable is dependent (A = f(B) is not B = f(A)) • Some directional measures have a “symmetric” value, the weighted average of the other two Lecture #10
Symmetric Measures • The “Contingency Coefficient” is the main symmetric measure with a Chi Square test • Works even with nominal data • Evaluated like Pearson’s r • Phi and Cramer’s V are other symmetric measures Lecture #10
Directional Measures • Directional measures range from 0 to 1 • Lambda is the recommended directional measure - tells what proportion of the dependent variable is predicted by the independent variable (like Eta) • Eta can be applied here if one variable is interval or ratio scaled Lecture #10