Debuggin’

Debuggin’

Now comes the hard part • If the model is of moderate difficulty, we may run into problems in estimation. • Even if the model is easy, the bugs code can be tricky • So, what do you do?

Bugs is a pain • Most of the time we can execute programs line by line • In R or stata we can see if specific lines are creating problems and the errors we see are often informative. • We can also see what is available in memory or the workspace. • You can, for instance, check the dimensionality of objects

Bugs is a pain • This doesn’t work in Bugs • Bugs either works as a program, or it doesn’t. • There are a couple of different sources • Syntax • Compiling • Syntax will choke early and tell you so.

Bugs problems • Model not compiling • Too much info in the distribution argument • Defines the mean of y.hat in the distribution • Bugs error: “expected a comma” • Added space • pow (sigma, -2) • Bugs error “invalid or unexpected token scanned”

Bugs problems • Parameter not defined • mu.a definition is removed • Bugs error: “made use of undefined node mu.a” • Multiply defined paramters • y.hat <- a[county[i]] + b*x[i] • Should be y.hat[i] • Bugs error: “empty slot not allowed in variable name”

Bugs problems • Data problems • Missing data • The first observation of x is NA • Undefined node • Wrong variables, include x code says x3 • Undefined variable • Leaves data file open

Bugs problems • Subscripting wrong • i should be j or vice versa • n should be J • Initial values • Can ask initial values to generate values out of bounds (a uniform that is strictly negative • Error is: value of uniform sigma.a must be less than upper bound

Bugs Problems • Nonsense results • Remember that the second element in the normal definition is the precision, not the variance • Allowing the variance to vary as a function of the individual observations • y[i] ~ dnorm(y.hat[i], tau.y[i]) • List the data to make sure it is what you think it is

General Solution • Go from easy to hard model • Change line by line to make sure the code works • If all else fails, start over

Speeding convergence • Thinning • The option in bug is n.thin • Set the number of iterations between saves • Don’t save every iteration to memory • Cuts down on the autocorrelation • Helps in convergence

Convergence • Center the data • Cuts down on the correlation between the parameters • Speeds up convergence • The individual draws are form the conditional, but if the parameters are highly correlated then the draws are conditioned on the other estimates

Redundant parameters • The idea is that some parameters are perfectly collinear with the problematic ones. • For instance, the group level means are perfectly collinear with the mean and variances

Redundant parameters • Example:

Redundant parameters model{ for (i in 1:n){ y[i] ~ dnorm(y.hat[i], tau.y) y.hat[i]<- mu+eta[county[i]] } mu ~ dnorm (0, .0001) tau.y <- pow(sigma.y, -2) sigma.y ~ d.unif (0, 100) for (j in 1:n.county){ eta[j]~ dnorm (0, tau.eta) } tau.eta <- pow(sigma.eta, -2) sigma.eta~dunif(0, 100) }

Redundant parameters • The problem is the eta estimates will get stuck a ways away from zero. • It can be faster • Change the loop mu.adj <- mu + mean(eta[]) for (j in 1:n.county){ eta[j] ~ dnorm (0, tau.eta) eta.adj[j] <- eta[j] – mean(eta[]) } mu.eta ~ dnorm(0, .0001) }

need initial values of all of the parameters • Only need to monitor the adjusted parameters • These are easier to estimate and work better

Summarizing results • In classical models we report: • Coefficients • Standard errors • Diagnostics (maybe) • Graph of interesting results • We make inferences by • Specifying a null hypothesis • Looking at some test statistic • Use a knife edged test (p<0.05?)

Multilevel models • These models create difficulties in this framework • Too many parameters • We don’t always care about all of them • Do you care about the random differences in the intercepts from different groups? • If we are estimating a Bayesian model inference is different. • There is both uncertainty and variability in the estimates

Which parameters to present? • Which ones do you care about? • Generally, I don’t care about the higher level random parameters. • I care about the gammas—these test hypotheses • I don’t care about the etas. These are to get the specification right. • We generally don’t care about proper names

Example Steenbergen and Jones • Steenbergen and Jones (2002), an intro to HLM in AJPS • Model: • Data: Eurobarometer of 15 countries in 1996 • DV is support for the EU • Ind. Level IV’s: low income, high income, ideology, opinion leadership, male, age, • Macro level IV’s: party cue, Tenure in the EU, trade. • Note: Two levels of macro: Party and nation • Three sources of dataerror: individual, party and country • Three subscripts individual, party, country • Subscripting increases alphabetically as aggregation increases

Full equation

Presentation • Note a couple of things • It is almost indistinguishable from the regression results • Variance components set aside • Could have added subheadings to separate out the effects into different levels • No party or country level effects presented (or probably even calculated) • This is MUCH better than G&H’s style

Random slope Mean Variance Covariance with random intercept

Uncertainty versus variability • Even if we take this approach there is something different here. • There are two “variance” estimates in the model • Standard error of the gamma term (level two effects) • Variance in the level one effect (tau) • These are different

What are they? • The standard error is the uncertainty we have in the estimate of the mean • The variance term is how much the “average” effect varies across the groups • One is a feature of the probabilistic enterprise of statistics • The second is the variability inherent in the world

Huh? • Keep in the mind the difference with OLS • In OLS we assume the tau’s are zero. • OLS is special case of HLM • We still get standard errors • They are the variance estimate for our coefficients • Adding the random terms says that there is something about the groups (aside from the z’s) that explains either y (random intercept) or the link between x&y (random slope)

Superpopulation and finite population variance • It gets worse • There are two ways to think about the group level random terms • Superpopulation—the variance in the probability distribution that the terms were assumed to be drawn from—this is the uncertainty we have about the value of a NEW group

Finite population—these are the variances of the estimated error terms themselves • The distinction is similar to the standard deviation standard error distinction or the difference between sigma squared and s squared in regression

Superpopulation are the ones reported by R. They are the variances in the probability distribution that we estimate • Finite populations are smaller—the equations are in G&H and I don’t’ really want to deal with them. I haven’t seen any examples of people who care about them. • But the distinction is important. These superpopulation estimates are what we are talking about. The broader probability distribution of our coefficients.

Hypothesis testing • The key is to keep in mind what the null hypothesis is. • Do variables matter? • This is easy—these are the individual coefficients • Ignore the random stuff for a minute. • We are specifying linear, additive, and maybe interactive models. In HLM, these are the gammas • This is the irony—theory is all about the gammas. Difficulty is all about the tau’s

Bayesian inference • With Bugs we are living in a different world. • Likelihood—calculate beta, get the standard error, compare the ratio to a z table • But think about what you are doing • Coefficients follow some distribution that you are summarizing • Bayesian is different • Parameter estimates are not assumed to follow any distribution. • Use the Monte Carlo estimates

Bayesian inference • “null hypothesis” is still basically the same • Effect is zero—compared to a largely positive or negative effects • What does this mean? That the samples drawn from the posterior will contain zero or less(more) with some regularity. • Because Bayesian inference does not rely on the same probabilistic theory, you can’t really have the same knife edged inference

Bayesian Inference • What you see is how much of the posterior sample is larger(smaller) than zero. If this is a sizable proportion, then the effect is “significant.” If it isn’t a sizable proportion, then it isn’t signficant. • My experience is that readers still like the classical framework.

Bayesian inference • What do you present • For each of the effects: • Mean • Standard deviation • Confidence region • Bugs output gives us all of that • Maybe how much of the sample is greater or less than zero. • Histogram of the posterior sample might be nice.

Bayesian Inference • See handout

Also why we don’t really care about the group level effects. They aren’t zero.

Complex hypothesis tests • Deviance tests • Note that BUGS output gives us the DIC • Smaller DIC is better • It reminds us of that • Can compare nested models this way

So, do I need HLM? • Basic strategy is to start easy and work to hard. • Run lm/glm • That is, assume that the group level errors are all zero. • Relax this assumption and estimate a likelihood based HLM. • See how much the results change and if the deviance tests change • Start with a random intercept and work up

So do I need HLM • Add parameters and complexity as needed. • Note the difference between a “random slope” model and an interactive one • Random implies probability distribution • It is always a good idea run as complex a model as you can to satisfy yourself that it does not better.

MLE vs. Bayesian • When would I use lmer? • When the model is pretty easy • N and J are large • Not a lot of random terms • Not much covariance between them

Other issues • There aren’t great fit statistics • R-square but that strikes me as silly • Subscripting • It matters when you write out the equation • It doesn’t matter much to lmer (you don’t need to specify it in the equation) when you are specifying the coefficients • It does matter a lot to lmer when you are specifying the random terms. You can only have random terms (1|variable) of lower level variables based on upper level variables. The order matters. The lower level goes before the | and the upper level goes after it.

Debuggin’

Debuggin’

Presentation Transcript