1 / 155

POLS 606

POLS 606. Hierarchical Models. Intro . Who are you? Fields Substantive interests? I am Dave American politics Campaigns and elections. Logistics. Book—Gelman and Hill Snijders and Bosker is recommended. Chapter 2 of G&H is up to you to master G&H don’t rely on matrix algebra to teach

val
Download Presentation

POLS 606

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. POLS 606 Hierarchical Models

  2. Intro • Who are you? • Fields • Substantive interests? • I am Dave • American politics • Campaigns and elections

  3. Logistics • Book—Gelman and Hill • Snijders and Bosker is recommended. • Chapter 2 of G&H is up to you to master • G&H don’t rely on matrix algebra to teach • Probability and simulation • Bayesian • R • Not easier or harder, just different.

  4. Lectures • There will be a mix • Math (no PowerPoint) • There will be lectures using R • Will tend to alternate

  5. R • Very powerful/flexible • Wave of the future • Need to understand stuff more • I have never used it before so we will learn it together

  6. Grades • Homework • There will be a bunch and they will be a mix of practical and theory • Final • Questions I would write for the methods exam • Paper • Original research using HLM. Won’t be due until start of Fall Semester

  7. What is a multilevel model? • Theory tells you that concepts at more than one level of aggregation are related • Usually thought of as geographic • Countries • States • Schools

  8. What is a multilevel model? • Doesn’t have to be • Time • Experimental Condition • Institutions • Regime • Bureaucracies • Individuals (panel data)

  9. Theory is key! • Two types of relationships • Random intercepts • Mean value of DV depends on aggregate unit • Random slopes • Effect of IV depends on aggregate unit • Can have both

  10. So you have multilevel data • Choice 1: Aggregate • Combine data to the highest level of aggregation • Create “average” value of variables for each higher unit • Advantage • Easy! • Can easily weigh based on N • Straightforward

  11. Aggregate • Disadvantages • Shifts meaning • Variables are macro level. Theory is (presumably) micro level. • Ecological fallacy • Classic example: Race and literacy (Robinson 1950)

  12. By region, ρ = .946

  13. State, ρ = .773

  14. Individual level, ρ = .203

  15. Why different? • The key is the within region correlations

  16. Both individual and ecological correlation depend on this, but in different ways • Individual depends on the internal cells of the region table • Weighted average of the corrs within the regions • Ecological on depends on the marginals • Only the marginals—no use of info in the cells.

  17. So? • The things that go into the calculation of the ecological correlation do not tell us anything about what we are interested in.

  18. Some Math • Assume • Total group of N Persons • Two variables x & x • N people divided in to m groups. • X & Y are % of x & y in each of the m groups

  19. Three correlations • 1) Total individual correlation (r) • Correlation ignoring the grouping • 2) Ecological correlation (re) • Correlation between m pairs (weighted by n of m) • 3) within area Correlations (rw) • This is the weighted average of the correlations withi the m groups

  20. Two correlation ratios • ηXA & ηYA • Measure the degree of clustering of X&Y by area • High ηXA means wide variation in X across regions

  21. Math • Can write the relationship between the correlations as:

  22. So? • re, then, is the weighted difference between individual correlation (thing we care about) and the average of m within area correlations where weights depend on clustering • Bias is not innocuous. Correlations are inflated. re will be large in magnitude than r. • Cannot infer across levels. Don’t do it. Won’t get away with it.

  23. Disaggregate • You could ignore the higher level of aggregation and pretend everything is observed at the individual level • Advantage: Easy and generous • Disadvantage • You are lying. • Overstate power • Ignores correlations in the errors (not iid)

  24. Dependence of errors • The problem is a function of the intraclass correlation • Simple model: • Y is the DV • μ = Grand Mean • Uj = Macro effects (errors) • Rij = Unit specific errors

  25. Intraclass correlation • Errors all mean 0 • Expected value of macro units are: μ+Uj

  26. Intraclass correlation • It is the proportion of the variance in Y explained by the macro effects • The key concept in HLM • It is the degree of similarity of observations within the groups

  27. Intraclass Correlation • Note that it changes the error variance • OLS assumes that errors are uncorrelated across observations. • This says they aren’t. • Inflates power • Shrinks standard errors • Macro variables will try to account for this

  28. Other solutions to multilevel data • Dummy variables • Doesn’t fix standard errors • Can’t specify interesting effects • Clustering • Fixes errors but not all other problems. • Ignores any systematic problems and the theories associated

  29. Real Solution? HLM • Effects may vary (random slopes) • Use all of the info available and use it accurately • Better predictions • Account for structure in data • Efficiency • Accurate standard errors

  30. How HLM? R! • R is a different kind of stats package • It is a language, not a program • Open source • http://cran.r-project.org/ • Problem is that it is not obviously user-friendly • No point and click front end embedded. • This can be addressed—R is adaptable

  31. R • The computer staff tells me it is installed and they will install it on your office machines • Update by adding packages • Rcmdr – gui interface • arm • BRugs • R2WinBUGS • car • foreign • DAAG • Matrix and lme4 if not automatically

  32. Packages • packages are commands or sets of programs to do things. • sessionInfo() tells you what are currently attached • library(“name”)

  33. R • Need to load packages each time • The basic starting place for R is the command Prompt (>) • R will take anything you type at this line as a command and will respond • Load packages as library(arm) • Can (and probably should) write a script to do it all

  34. R • If you just start typing stuff, R assumes you are telling it to evaluate a statement • 2+2 • pi • Any math equation. • R wants you to define “objects” • Everything needs to be an object

  35. Commands • Basic format • “object”<-”command”(“definition”, option, option) • Example: open data • kidiq <- read.dta(file="c:/R/kidiq.dta") • reads the childrens IQ score data used in Chapter2 • “kidiq” names object kidiq • “<-” tells R that you are going to give it a definition • “read.dta” is the command to read data • “(file=“c:/R/kidiq,dta”)” tells it which data. Note / not \

  36. R • Random things about objects • Case sensitive • Can (and often do) have . in the name • Will remember that they are there • Can see objects by ls() command • <- defines (equivalent to =) • Look at example 1

  37. R working directory and workspace • Each session has a working directory • Where R looks for files • If launched from windows icon can define under properties (right click) • getwd() • ls() • q() • Save workspace image? • Saves all objects in a .RData file

  38. Help! • help.start() • help(“name”)

  39. Script • Can do line by line commands, but those are slow, temporary and error prone • better to use script editor: • File->new script • control+N • Can save and re-load

  40. Missing data • R handles missing data • uses “NA” • Will read in data and convert just fine

  41. Reading in data • kidiq <- read.dta(file="c:/R/kidiq.dta") • We have seen this before • Attaching: • in commands you need to tell R which data you are using (in fact, you can have lots of data sets loaded at once). • fit<-lm(kid.score~mom.hs, data=kidiq) • The command is attach • attach(kidiq) • fit<-lm(kid.score~mom.hs) • detach(kidiq)

  42. Attach • R looks for things in a particular order • search() • Attach moves stuff around in the order • Order matters a lot—names of objects versus names of variables

  43. Rcmdr • Handy front end, point and click • library(Rcmdr) • Has a script window • Nice, but don’t lean on it too hard • Thinks it is smarter than you

  44. JGR (“Jaguar”) • Need to download and install it • Probably need computer staff for machines • Launches separate from R • Package manager is very nice • Runs separate version of R

  45. Graphics • R has wonderful graphics if you can them to do what you want. • demo(graphics) • Starting point is plot() • plot(y~x) • plot(x, y) • graphs.R

  46. Regression • You should know the basics and this should be review • Data being used: kidsiq (same as before)

  47. Regression kid.score = a + b(mom.hs) + error lm(formula = kid.score ~ mom.hs) coef.est coef.se (Intercept) 77.55 2.06 mom.hs 11.77 2.32 n = 434, k = 2 residual sd = 19.85, R-Squared = 0.06 • Interpret • 78 = E(kid.score) if mom.hs=0 • 12 = Expected change in ks when mom.hs = 1

  48. Regression • kid.score = a + b(mom.iq) + error lm(formula = kid.score ~ mom.iq) coef.est coef.se (Intercept) 25.80 5.92 mom.iq 0.61 0.06 n = 434, k = 2 residual sd = 18.27, R-Squared = 0.20 • Interpret • 26 = E(kid.score) when mom.iq=0 • 0.61 = expected change in ks for every iq point of mom

  49. Regression • Both predictors lm(formula = kid.score ~ mom.hs + mom.iq) coef.est coef.se (Intercept) 25.73 5.88 mom.hs 5.95 2.21 mom.iq 0.56 0.06 n = 434, k = 3 residual sd = 18.14, R-Squared = 0.21 • Interpret?

  50. Interactions • Remember, sometimes the effect of a variable is conditional on another variable • In stata you need to create the interaction, in R you can do it on the fly

More Related