460 likes | 957 Views
Debuggin’. Now comes the hard part. If the model is of moderate difficulty, we may run into problems in estimation. Even if the model is easy, the bugs code can be tricky So, what do you do?. Bugs is a pain. Most of the time we can execute programs line by line
E N D
Now comes the hard part • If the model is of moderate difficulty, we may run into problems in estimation. • Even if the model is easy, the bugs code can be tricky • So, what do you do?
Bugs is a pain • Most of the time we can execute programs line by line • In R or stata we can see if specific lines are creating problems and the errors we see are often informative. • We can also see what is available in memory or the workspace. • You can, for instance, check the dimensionality of objects
Bugs is a pain • This doesn’t work in Bugs • Bugs either works as a program, or it doesn’t. • There are a couple of different sources • Syntax • Compiling • Syntax will choke early and tell you so.
Bugs problems • Model not compiling • Too much info in the distribution argument • Defines the mean of y.hat in the distribution • Bugs error: “expected a comma” • Added space • pow (sigma, -2) • Bugs error “invalid or unexpected token scanned”
Bugs problems • Parameter not defined • mu.a definition is removed • Bugs error: “made use of undefined node mu.a” • Multiply defined paramters • y.hat <- a[county[i]] + b*x[i] • Should be y.hat[i] • Bugs error: “empty slot not allowed in variable name”
Bugs problems • Data problems • Missing data • The first observation of x is NA • Undefined node • Wrong variables, include x code says x3 • Undefined variable • Leaves data file open
Bugs problems • Subscripting wrong • i should be j or vice versa • n should be J • Initial values • Can ask initial values to generate values out of bounds (a uniform that is strictly negative • Error is: value of uniform sigma.a must be less than upper bound
Bugs Problems • Nonsense results • Remember that the second element in the normal definition is the precision, not the variance • Allowing the variance to vary as a function of the individual observations • y[i] ~ dnorm(y.hat[i], tau.y[i]) • List the data to make sure it is what you think it is
General Solution • Go from easy to hard model • Change line by line to make sure the code works • If all else fails, start over
Speeding convergence • Thinning • The option in bug is n.thin • Set the number of iterations between saves • Don’t save every iteration to memory • Cuts down on the autocorrelation • Helps in convergence
Convergence • Center the data • Cuts down on the correlation between the parameters • Speeds up convergence • The individual draws are form the conditional, but if the parameters are highly correlated then the draws are conditioned on the other estimates
Redundant parameters • The idea is that some parameters are perfectly collinear with the problematic ones. • For instance, the group level means are perfectly collinear with the mean and variances
Redundant parameters • Example:
Redundant parameters model{ for (i in 1:n){ y[i] ~ dnorm(y.hat[i], tau.y) y.hat[i]<- mu+eta[county[i]] } mu ~ dnorm (0, .0001) tau.y <- pow(sigma.y, -2) sigma.y ~ d.unif (0, 100) for (j in 1:n.county){ eta[j]~ dnorm (0, tau.eta) } tau.eta <- pow(sigma.eta, -2) sigma.eta~dunif(0, 100) }
Redundant parameters • The problem is the eta estimates will get stuck a ways away from zero. • It can be faster • Change the loop mu.adj <- mu + mean(eta[]) for (j in 1:n.county){ eta[j] ~ dnorm (0, tau.eta) eta.adj[j] <- eta[j] – mean(eta[]) } mu.eta ~ dnorm(0, .0001) }
need initial values of all of the parameters • Only need to monitor the adjusted parameters • These are easier to estimate and work better
Summarizing results • In classical models we report: • Coefficients • Standard errors • Diagnostics (maybe) • Graph of interesting results • We make inferences by • Specifying a null hypothesis • Looking at some test statistic • Use a knife edged test (p<0.05?)
Multilevel models • These models create difficulties in this framework • Too many parameters • We don’t always care about all of them • Do you care about the random differences in the intercepts from different groups? • If we are estimating a Bayesian model inference is different. • There is both uncertainty and variability in the estimates
Which parameters to present? • Which ones do you care about? • Generally, I don’t care about the higher level random parameters. • I care about the gammas—these test hypotheses • I don’t care about the etas. These are to get the specification right. • We generally don’t care about proper names
Example Steenbergen and Jones • Steenbergen and Jones (2002), an intro to HLM in AJPS • Model: • Data: Eurobarometer of 15 countries in 1996 • DV is support for the EU • Ind. Level IV’s: low income, high income, ideology, opinion leadership, male, age, • Macro level IV’s: party cue, Tenure in the EU, trade. • Note: Two levels of macro: Party and nation • Three sources of dataerror: individual, party and country • Three subscripts individual, party, country • Subscripting increases alphabetically as aggregation increases
Presentation • Note a couple of things • It is almost indistinguishable from the regression results • Variance components set aside • Could have added subheadings to separate out the effects into different levels • No party or country level effects presented (or probably even calculated) • This is MUCH better than G&H’s style
Random slope Mean Variance Covariance with random intercept
Uncertainty versus variability • Even if we take this approach there is something different here. • There are two “variance” estimates in the model • Standard error of the gamma term (level two effects) • Variance in the level one effect (tau) • These are different
What are they? • The standard error is the uncertainty we have in the estimate of the mean • The variance term is how much the “average” effect varies across the groups • One is a feature of the probabilistic enterprise of statistics • The second is the variability inherent in the world
Huh? • Keep in the mind the difference with OLS • In OLS we assume the tau’s are zero. • OLS is special case of HLM • We still get standard errors • They are the variance estimate for our coefficients • Adding the random terms says that there is something about the groups (aside from the z’s) that explains either y (random intercept) or the link between x&y (random slope)
Superpopulation and finite population variance • It gets worse • There are two ways to think about the group level random terms • Superpopulation—the variance in the probability distribution that the terms were assumed to be drawn from—this is the uncertainty we have about the value of a NEW group
Finite population—these are the variances of the estimated error terms themselves • The distinction is similar to the standard deviation standard error distinction or the difference between sigma squared and s squared in regression
Superpopulation are the ones reported by R. They are the variances in the probability distribution that we estimate • Finite populations are smaller—the equations are in G&H and I don’t’ really want to deal with them. I haven’t seen any examples of people who care about them. • But the distinction is important. These superpopulation estimates are what we are talking about. The broader probability distribution of our coefficients.
Hypothesis testing • The key is to keep in mind what the null hypothesis is. • Do variables matter? • This is easy—these are the individual coefficients • Ignore the random stuff for a minute. • We are specifying linear, additive, and maybe interactive models. In HLM, these are the gammas • This is the irony—theory is all about the gammas. Difficulty is all about the tau’s
Bayesian inference • With Bugs we are living in a different world. • Likelihood—calculate beta, get the standard error, compare the ratio to a z table • But think about what you are doing • Coefficients follow some distribution that you are summarizing • Bayesian is different • Parameter estimates are not assumed to follow any distribution. • Use the Monte Carlo estimates
Bayesian inference • “null hypothesis” is still basically the same • Effect is zero—compared to a largely positive or negative effects • What does this mean? That the samples drawn from the posterior will contain zero or less(more) with some regularity. • Because Bayesian inference does not rely on the same probabilistic theory, you can’t really have the same knife edged inference
Bayesian Inference • What you see is how much of the posterior sample is larger(smaller) than zero. If this is a sizable proportion, then the effect is “significant.” If it isn’t a sizable proportion, then it isn’t signficant. • My experience is that readers still like the classical framework.
Bayesian inference • What do you present • For each of the effects: • Mean • Standard deviation • Confidence region • Bugs output gives us all of that • Maybe how much of the sample is greater or less than zero. • Histogram of the posterior sample might be nice.
Bayesian Inference • See handout
Also why we don’t really care about the group level effects. They aren’t zero.
Complex hypothesis tests • Deviance tests • Note that BUGS output gives us the DIC • Smaller DIC is better • It reminds us of that • Can compare nested models this way
So, do I need HLM? • Basic strategy is to start easy and work to hard. • Run lm/glm • That is, assume that the group level errors are all zero. • Relax this assumption and estimate a likelihood based HLM. • See how much the results change and if the deviance tests change • Start with a random intercept and work up
So do I need HLM • Add parameters and complexity as needed. • Note the difference between a “random slope” model and an interactive one • Random implies probability distribution • It is always a good idea run as complex a model as you can to satisfy yourself that it does not better.
MLE vs. Bayesian • When would I use lmer? • When the model is pretty easy • N and J are large • Not a lot of random terms • Not much covariance between them
Other issues • There aren’t great fit statistics • R-square but that strikes me as silly • Subscripting • It matters when you write out the equation • It doesn’t matter much to lmer (you don’t need to specify it in the equation) when you are specifying the coefficients • It does matter a lot to lmer when you are specifying the random terms. You can only have random terms (1|variable) of lower level variables based on upper level variables. The order matters. The lower level goes before the | and the upper level goes after it.