POLS 606

POLS 606 Hierarchical Models

Intro • Who are you? • Fields • Substantive interests? • I am Dave • American politics • Campaigns and elections

Logistics • Book—Gelman and Hill • Snijders and Bosker is recommended. • Chapter 2 of G&H is up to you to master • G&H don’t rely on matrix algebra to teach • Probability and simulation • Bayesian • R • Not easier or harder, just different.

Lectures • There will be a mix • Math (no PowerPoint) • There will be lectures using R • Will tend to alternate

R • Very powerful/flexible • Wave of the future • Need to understand stuff more • I have never used it before so we will learn it together

Grades • Homework • There will be a bunch and they will be a mix of practical and theory • Final • Questions I would write for the methods exam • Paper • Original research using HLM. Won’t be due until start of Fall Semester

What is a multilevel model? • Theory tells you that concepts at more than one level of aggregation are related • Usually thought of as geographic • Countries • States • Schools

What is a multilevel model? • Doesn’t have to be • Time • Experimental Condition • Institutions • Regime • Bureaucracies • Individuals (panel data)

Theory is key! • Two types of relationships • Random intercepts • Mean value of DV depends on aggregate unit • Random slopes • Effect of IV depends on aggregate unit • Can have both

So you have multilevel data • Choice 1: Aggregate • Combine data to the highest level of aggregation • Create “average” value of variables for each higher unit • Advantage • Easy! • Can easily weigh based on N • Straightforward

Aggregate • Disadvantages • Shifts meaning • Variables are macro level. Theory is (presumably) micro level. • Ecological fallacy • Classic example: Race and literacy (Robinson 1950)

By region, ρ = .946

State, ρ = .773

Individual level, ρ = .203

Why different? • The key is the within region correlations

Both individual and ecological correlation depend on this, but in different ways • Individual depends on the internal cells of the region table • Weighted average of the corrs within the regions • Ecological on depends on the marginals • Only the marginals—no use of info in the cells.

So? • The things that go into the calculation of the ecological correlation do not tell us anything about what we are interested in.

Some Math • Assume • Total group of N Persons • Two variables x & x • N people divided in to m groups. • X & Y are % of x & y in each of the m groups

Three correlations • 1) Total individual correlation (r) • Correlation ignoring the grouping • 2) Ecological correlation (re) • Correlation between m pairs (weighted by n of m) • 3) within area Correlations (rw) • This is the weighted average of the correlations withi the m groups

Two correlation ratios • ηXA & ηYA • Measure the degree of clustering of X&Y by area • High ηXA means wide variation in X across regions

Math • Can write the relationship between the correlations as:

So? • re, then, is the weighted difference between individual correlation (thing we care about) and the average of m within area correlations where weights depend on clustering • Bias is not innocuous. Correlations are inflated. re will be large in magnitude than r. • Cannot infer across levels. Don’t do it. Won’t get away with it.

Disaggregate • You could ignore the higher level of aggregation and pretend everything is observed at the individual level • Advantage: Easy and generous • Disadvantage • You are lying. • Overstate power • Ignores correlations in the errors (not iid)

Dependence of errors • The problem is a function of the intraclass correlation • Simple model: • Y is the DV • μ = Grand Mean • Uj = Macro effects (errors) • Rij = Unit specific errors

Intraclass correlation • Errors all mean 0 • Expected value of macro units are: μ+Uj

Intraclass correlation • It is the proportion of the variance in Y explained by the macro effects • The key concept in HLM • It is the degree of similarity of observations within the groups

Intraclass Correlation • Note that it changes the error variance • OLS assumes that errors are uncorrelated across observations. • This says they aren’t. • Inflates power • Shrinks standard errors • Macro variables will try to account for this

Other solutions to multilevel data • Dummy variables • Doesn’t fix standard errors • Can’t specify interesting effects • Clustering • Fixes errors but not all other problems. • Ignores any systematic problems and the theories associated

Real Solution? HLM • Effects may vary (random slopes) • Use all of the info available and use it accurately • Better predictions • Account for structure in data • Efficiency • Accurate standard errors

How HLM? R! • R is a different kind of stats package • It is a language, not a program • Open source • http://cran.r-project.org/ • Problem is that it is not obviously user-friendly • No point and click front end embedded. • This can be addressed—R is adaptable

R • The computer staff tells me it is installed and they will install it on your office machines • Update by adding packages • Rcmdr – gui interface • arm • BRugs • R2WinBUGS • car • foreign • DAAG • Matrix and lme4 if not automatically

Packages • packages are commands or sets of programs to do things. • sessionInfo() tells you what are currently attached • library(“name”)

R • Need to load packages each time • The basic starting place for R is the command Prompt (>) • R will take anything you type at this line as a command and will respond • Load packages as library(arm) • Can (and probably should) write a script to do it all

R • If you just start typing stuff, R assumes you are telling it to evaluate a statement • 2+2 • pi • Any math equation. • R wants you to define “objects” • Everything needs to be an object

Commands • Basic format • “object”<-”command”(“definition”, option, option) • Example: open data • kidiq <- read.dta(file="c:/R/kidiq.dta") • reads the childrens IQ score data used in Chapter2 • “kidiq” names object kidiq • “<-” tells R that you are going to give it a definition • “read.dta” is the command to read data • “(file=“c:/R/kidiq,dta”)” tells it which data. Note / not \

R • Random things about objects • Case sensitive • Can (and often do) have . in the name • Will remember that they are there • Can see objects by ls() command • <- defines (equivalent to =) • Look at example 1

R working directory and workspace • Each session has a working directory • Where R looks for files • If launched from windows icon can define under properties (right click) • getwd() • ls() • q() • Save workspace image? • Saves all objects in a .RData file

Help! • help.start() • help(“name”)

Script • Can do line by line commands, but those are slow, temporary and error prone • better to use script editor: • File->new script • control+N • Can save and re-load

Missing data • R handles missing data • uses “NA” • Will read in data and convert just fine

Reading in data • kidiq <- read.dta(file="c:/R/kidiq.dta") • We have seen this before • Attaching: • in commands you need to tell R which data you are using (in fact, you can have lots of data sets loaded at once). • fit<-lm(kid.score~mom.hs, data=kidiq) • The command is attach • attach(kidiq) • fit<-lm(kid.score~mom.hs) • detach(kidiq)

Attach • R looks for things in a particular order • search() • Attach moves stuff around in the order • Order matters a lot—names of objects versus names of variables

Rcmdr • Handy front end, point and click • library(Rcmdr) • Has a script window • Nice, but don’t lean on it too hard • Thinks it is smarter than you

JGR (“Jaguar”) • Need to download and install it • Probably need computer staff for machines • Launches separate from R • Package manager is very nice • Runs separate version of R

Graphics • R has wonderful graphics if you can them to do what you want. • demo(graphics) • Starting point is plot() • plot(y~x) • plot(x, y) • graphs.R

Regression • You should know the basics and this should be review • Data being used: kidsiq (same as before)

Regression kid.score = a + b(mom.hs) + error lm(formula = kid.score ~ mom.hs) coef.est coef.se (Intercept) 77.55 2.06 mom.hs 11.77 2.32 n = 434, k = 2 residual sd = 19.85, R-Squared = 0.06 • Interpret • 78 = E(kid.score) if mom.hs=0 • 12 = Expected change in ks when mom.hs = 1

Regression • kid.score = a + b(mom.iq) + error lm(formula = kid.score ~ mom.iq) coef.est coef.se (Intercept) 25.80 5.92 mom.iq 0.61 0.06 n = 434, k = 2 residual sd = 18.27, R-Squared = 0.20 • Interpret • 26 = E(kid.score) when mom.iq=0 • 0.61 = expected change in ks for every iq point of mom

Regression • Both predictors lm(formula = kid.score ~ mom.hs + mom.iq) coef.est coef.se (Intercept) 25.73 5.88 mom.hs 5.95 2.21 mom.iq 0.56 0.06 n = 434, k = 3 residual sd = 18.14, R-Squared = 0.21 • Interpret?

Interactions • Remember, sometimes the effect of a variable is conditional on another variable • In stata you need to create the interaction, in R you can do it on the fly

POLS 606

POLS 606

Presentation Transcript

pols 3053 international relations

POLS 3053 International Relations

POLS 3053 International Relations

POLS 3053 International Relations

POLS 550 Comparative Politics

Presentations of Learning (POLs)

POLS 550 Comparative Politics

IR COMD POLS COMD IR IR Global COMD POLS POLS IR Global Psychology IR COMD IR IR

POLS 3022: Washington Internship

CHE 606

MGT 606 - Business Simulation

VS-606 V7

POLS 570 week 7

POLS 570 week 9

POLS 606

POLS 382: State Government

IBM C2150-606 PDF Dumps with C2150-606 Questions Answers

C2150-606 Exam Dumps - Actual C2150-606 Dumps PDF

C2150-606 Dumps - [2018] Actual C2150-606 Exam Questiosn PDF