690 likes | 849 Views
Advanced Models and Methods in Behavioral Research. Chris Snijders c.c.p.snijders@gmail.com 3 ects http://www.chrissnijders.com/ammbr (=studyguide) literature: Field book + separate course material laptop exam (+ assignments). ToDo ( if not done yet ): Enroll in 0a611.
E N D
Advanced Models and Methods in Behavioral Research • Chris Snijders • c.c.p.snijders@gmail.com • 3 ects • http://www.chrissnijders.com/ammbr (=studyguide) • literature: Field book + separate course material • laptop exam (+ assignments) ToDo (ifnotdoneyet): Enroll in 0a611 Advanced Methods and Models in Behavioral Research –
The methods package • MMBR (6 ects) • Blumberg: questions, reliability, validity, research design • Field: SPSS: factor analysis, multiple regression, ANcOVA, sample sizeetc • AMMBR (3 ects) - Field (1 chapter): logistic regression - literaturethrough website: conjoint analysis multi-level regression Advanced Methods and Models in Behavioral Research –
Models and methods: topics • t-test, Cronbach's alpha, etc • multiple regression, analysis of (co)varianceand factor analysis • logisticregression • conjoint analysis / repeatedmeasures • Stata next to SPSS • “Finding new questions” • Some data collection In the background: “now you should be able to deal with data on your own” Advanced Methods and Models in Behavioral Research –
Methods in brief (1) • Logisticregression: target Y, predictorsXi. Y is a binaryvariable (0/1). - Whynotjust multiple regression? - Interpretation is more difficult - goodness of fit is non-standard - ... (andit is a chapter in Field) Advanced Methods and Models in Behavioral Research –
Methods in brief (2) • Conjoint analysis Underlying assumption: for each user, the "utility" of an offer can be written as U(x1,x2, ... , xn) = c0 + c1 x1 + ... + cn xn • 10 Euro p/m • 2 years fixed • free phone • ... • How attractive is this • offer to you? Advanced Methods and Models in Behavioral Research –
Conjoint analysis as an “in between method” Between Which phone do you like and why? What would your favorite phone be? And: Let’s keep track of what people buy. We have: Advanced Methods and Models in Behavioral Research –
Local Master Thesis example: Fiber to the home Speed: really fast Price: sort of high Installation: free! Your neighbors: are in! How attractive is this to you? (RoelSchuring) Advanced Methods and Models in Behavioral Research –
Coming up with new ideas (3) “More research is necessary” But on what? YOU: come up with sensible new ideas, given previous research Advanced Methods and Models in Behavioral Research –
Stata next to SPSS • It’s just better (faster, better written, more possibilities, better programmable …) • Multi-level regression is much easier than in SPSS • It’s good to be exposed to more than just a single statistics package (your knowledge should not be based on “where to click” arguments) • More stable • BTW Supports OSX as well… (anybody?) Advanced Methods and Models in Behavioral Research –
Every advantage has a disadvantage • Output less “polished” • It takes some extra work to get you started • The Logistic Regression chapter in the Field book uses SPSS (but still readable for the larger part) • (and it’s not campus software, but subfaculty software) • Installation … Advanced Methods and Models in Behavioral Research –
Logistic Regression Analysis That is: your Y variable is 0/1: Now what? Credit where credit is due: slides adapted from Gerrit Rooks
The main points • Why do we have toknowandsometimesuselogisticregression? • What is the underlying model? What is maximum likelihoodestimation? • Logistics of logisticregression analysis • Estimatecoefficients • Assess model fit • Interpretcoefficients • Check residuals • An example (withsome output)
Suppose we have 100 observations with information about an individuals age and wether or not this indivual had some kind of a heart disease (CHD)
Let’s just try regression analysis pr(CHD|age) = -.54 +.022*Age
... linear regression is not a suitable model for probabilities pr(CHD|age) = -.54 +.0218107*Age
In this graph for 8 age groups, I plotted the probability of having a heart disease (proportion)
Predicted probabilities are always between 0 and 1 similar to classic regression analysis
Side note: this is similar to MMBR … Suppose Y is a percentage (so between 0 and 1). Then consider …which will ensure that the estimated Y will vary between 0 and 1 and after some rearranging this is the same as Advanced Methods and Models in Behavioral Research –
… (continued) • And one “solution” might be: • Change all Y values that are 0 to 0.001 • Change all Y values that are 1 to 0.999 • Now run regression on log(Y/(1-Y)) … • … but that really is sort of higgledy-piggledy … Advanced Methods and Models in Behavioral Research –
Logistics of logistic regression • How do we estimate the coefficients? • How do we assess model fit? • How do we interpret coefficients? • How do we check regression assumptions?
Kinds of estimation in regression • Ordinary Least Squares (we fit a line through a cloud of dots) • Maximum likelihood (we find the parameters that are the most likely, given our data) We never bothered to consider maximum likelihood in standard multiple regression, because you can show that they lead to exactly the same estimator (in MR, that is, normally they differ). Actually, maximum likelihood has superior statistical properties (efficiency, consistency, invariance, …) Advanced Methods and Models in Behavioral Research –
Maximum likelihood estimation • Method of maximum likelihood yields values for the unknown parameters that maximize the probability of obtaining the observed set of data Unknown parameters
Maximum likelihood estimation • First we have to construct the “likelihood function” (probability of obtaining the observed set of data). Likelihood = pr(obs1)*pr(obs2)*pr(obs3)…*pr(obsn) Assuming that observations are independent
Log-likelihood • For technical reasons the likelihood is transformed in the log-likelihood (then you just maximize the sum of the logged probabilities) LL= ln[pr(obs1)]+ln[pr(obs2)]+ln[pr(obs3)]…+ln[pr(obsn)]
Some subtleties • In OLS, we did not need stochastic assumptions to be able to calculate a best-fitting line (only for the estimates of the confidence intervals we need that). With maximum likelihood estimation we need this from the start (and let us not be bothered at this point by how the confidence intervals are calculated in maximum likelihood) Advanced Methods and Models in Behavioral Research –
And this is what it looks like … Advanced Methods and Models in Behavioral Research –
Note: optimizing log-likelihoods is difficult • It’s iterative (“searching the landscape”) it might not converge it might converge to the wrong answer Advanced Methods and Models in Behavioral Research –
Nasty implication: extreme cases should be left out (some handwaving here) Advanced Methods and Models in Behavioral Research –
Example (withsome SPSS output) Advanced Methods and Models in Behavioral Research –
This function fits best: other values of b0 and b1 give worse results (that is, other values have a smaller likelihood value)
Logistics of logistic regression • Estimate the coefficients (and their conf.int.) • Assess model fit • Between model comparisons • Pseudo R2 (similar to multiple regression) • Predictive accuracy • Interpret coefficients • Check regression assumptions
Model fit: comparisons between models The log-likelihood ratio test statistic can be used to test the fit of a model full model reduced model The test statistic has a chi-square distribution NOTE This is sort of similar to the variance decomposition tables you see in MR!
Between model comparisons: the likelihood ratio test full model reduced model The model including only an intercept Is often called the empty model. SPSS uses this model as a default.
Between model comparison: SPSS output This is the test statistic, and it’s associated significance
Just like in multiple regression, pseudo R2 ranges 0.0 to 1.0 Cox and Snell cannot theoretically reach 1 Nagelkerke adjusted so that it can reach 1 Overall model fitpseudo R2 log-likelihood of the model that you want to test log-likelihood of model before any predictors were entered NOTE: R2 in logistic regression tends to be (even) smaller than in multiple regression
Overall model fit: Classification table We predict 74% correctly
Overall model fit: Classification table 14 cases had a CHD while according to our model this shouldnt have happened
Overall model fit: Classification table 12 cases didn’t have a CHD while according to our model this should have happened
Logistics of logistic regression • Estimate the coefficients • Assess model fit • Interpret coefficients • Direction • Significance • Magnitude • Check regression assumptions
The Odds Ratio We had: And after some rearranging we can get