160 likes | 182 Views
PSY6010: Statistics, Psychometrics and Research Design. Professor Leora Lawton Spring 2007 Wednesdays 7-10 PM Room 204. PSY6010 Intro. Purpose Homework is due every week. Cut&Paste SPSS output into the word document. We will cover Correlative and predictive statistics
E N D
PSY6010: Statistics, Psychometrics and Research Design Professor Leora Lawton Spring 2007 Wednesdays 7-10 PM Room 204
PSY6010 Intro • Purpose • Homework is due every week. Cut&Paste SPSS output into the word document. • We will cover • Correlative and predictive statistics • Group membership predictors and descriptors. • Implications: you must know your research question and therefore the kind of answer necessary. • Anyone can play around with SPSS – a trained researcher will not produce garbage. • To use quantitative data to test your hypotheses in the best possible way available, given your constraints of time and money. • Select the correct method • Use it correctly • Interpret the results
Working with Data • Finalize your research question and study objectives: • What are your ‘success criteria’? • Then, to use quantitative data to test your hypotheses in the best possible way available, given your constraints of time and money, you: • Select the correct method • Use it correctly • Interpret the results
Working with Data • Steps to follow • Identify your dependent variable for operationalizing your outcome variable • Identify your potential independent variables • Run frequency with descriptives • Test relationship between DV and IVs with bivariate tests. • Try out the multivariate method, if relevant. • OLS regression • Binomial, polynomial or ordinal logistic regression • Discriminant • Anova/Manova • Factor analysis • Cluster analysis • Etc.
Frequencies – what to look for • Distribution – Is it skewed? Which direction? What is the impact on your analysis choice? • Are there outliers? • Is the current coding useful? • What are the means and measures of variance? • How many missings are there? • How do these help you understand the data?
Example • Suppose you’ve been asked by a family social services agency to understand how to best handle the suggested legislation for banning spanking in California. • What’s your research question? • Look in GSS93.sav …what’s a good DV for this study? • What Independent variables make sense for this study? • Would other variables help? Is that an insurmountable problem?
Frequencies 73.4% agree or strongly agree 33.5% missing In this data, missing values have already been defined With no ‘neutral’ position or midpoint, everyone takes a stand The bigger the skew, the more it’s lumped at the bottom. 0 is no skew, - values are lumped at the top
Bivariate tests - crosstabs Make the ‘row’ your DV and the ‘column’ the IV. Then select column cell statistics Note that cell counts get very small in some subgroups Chi-Square option in statistics shows that the distribution is not random.
Bivariate tests – compare means Note that lower value = stronger agreement. Would be good to reverse code for logic. Difference is statistically significant: Blacks support spanking more so than do whites and others.
Bivariate tests – Correlations The more children one has, the less likely they are to disagree (or the more likely they are to think spanking is okay) • And this is why theory matters: What does ‘number of children’ operationalize? • What else, that’s related to number of children, could also affect this attitude?
Preliminary multivariate model The model does not predict much as is as evidenced by the very small R2 But of those predictors, race and number of children are significant, in the ways expected by the bivariate results.
Working with data sets • When using quantitative data, you need to be able to prepare the data to be usable. • Load data set into spss • Label spss file if no syntax file is provided. • Run frequencies to investigate missings, outliers. • Define missing values to be excluded from analysis (either always or just for some specific analyses) • Recode variables from alpha to numeric. • Recode variables from categorical to dichotomous. • Recode missings to mean value. • Recode values to real midpoint values. • Special formats
Working with data sets • Uploading data from excel. • File, Open Data. Set Files of Type to .xls. Locate file in folder, Make sure you select the correct worksheet (older versions only read one worksheet). Click on Open. • If you need to, add the labels. This can make it easier for others to work with a data set. • Check the correct variable type (string, numeric or special). • Add values by clicking on the Values tab, and enter the numeric value in Value, and the label in Value Label. Click Add after each value/value label added, then when finished, click OK.
Cleaning up 1 • Recode string to numeric (very helpful when there are many many values). Transform – Autorecode – Give it a variable name, Click on Okay. • Recode to reverse values. If 5 = poor and 1 = excellent, it’s too hard to think about. Transform – Autorecode. Again, give it a variable name, and then click on Recode from Highest value. Oops, but now the 6 (no answer) is given the low value. So first set 6 as missing, then do the auto-recode. • Set as missing.. Click on Missing. Add the discrete value. Click on okay. Now try the previous autorecode again. • Another way to recode missing: Recode – Into Same value – Old & New Values. Put ‘3’ in old value and ‘system missing’ for new value. Click on Add, then Continue, then okay. • For recode a categorical to dichotomous (necessary for OLS regression), use Transform - Recode into Different Variable. Give it a new name and a new label. Click on Old & New Values. Old value for focal category is set to 1, all others to 0. Add new values, click on Continue, then OK. You can select cases so it’s based correctly. In this example, exclude those under 18 years old by clicking on If… (optional case selection) and then Include if case satisfied condition, and then click on the variable, and identify the value or values you want to select for this new variable).
Cleaning up 2 • Identify and transform outliers. (go to other .xls). • Recode values to meaningful values. Income, age ranges, are common variables requiring this transformation. • Recode into different value. Recode each value to the midpoint value of the range. For the low range, select 10% under, for the high range, it depends on how much skew there is, but 10-20% is appropriate. • On the original variable, calculate the mean. You can either reset missings to the recoded value closest to it, or impute a value. • Transforming to deal with heteroscedasticity. Logging income is a standard one.
Character of Data • Look for linearity, curvilinearity, multicollinearity, singularity. • Conduct bivariate analyses between your DV and your IVs. • If the DV is a continuous, or at least nominal variable, then you can compare means and look at the t-test or, anova. If it’s dichotomous, do crosstabs and look at the chi-square. • Curvilinear relationships will require a transformation of the IV to something more usable. • A correlation analysis will help you identify collinearity. Multicollinearity requires either dropping variables, or transforming the variables into an index, or factor analysis.