Lecture 6 Your data and models are never perfect… Making choices in research design and analysis that you can defend

Lecture 6Your data and models are never perfect…Making choices in research design and analysis that you can defend

Designing your study:Tradeoffs are everywhere • Sampling design and scope of inference • Tradeoffs between randomization vs. stratification • Allocating sampling effort • Tradeoffs between sample size and measurement error • Sample size and model complexity • Replication and independence • Spatial autocorrelation • Collinearity in your data, and parameter tradeoffs in your models

Classical Sampling Theory • Randomization vs. Stratification • Randomization: unbiased inference about populations • Stratification: parameterization of robust, predictive models

Allocation of Sampling Effort and Inference • Scope of Scientific Inference Is ecology a science of case studies, with no formal scope of inference? “Strength of evidence” for one hypothesis (relative to others): at the end of the day, is this all you can ever really hope to assess from your results? Remember: to a likelihoodist, all of the information relevant to inference about a hypothesis is contained within the data

Allocating effort to precision vs. replication:The benefits of large sample sizes… • Signal vs. Noise If you can see the signal, do you care how much noise there is? -- understanding can embrace uncertainty, but prediction loves precisionWhy do we love a high R2 (and why don’t statisticians share our preoccupation with goodness of fit)?

Sample size and model complexity • How many parameters in a model can your data support?How do you know if your model is “overspecified”? • Minimum # of observations per parameter: What’s your comfort zone? (mine is shrinking over time…) • How many parameters should your model contain? If parsimony is a core principle of science, don’t we have to accept a certain level of uncertainty? -- you can always add more terms to a model to increase R2, but at what cost to generality?

Independence of Observations vs. Residuals Definition of “independence” If two events (A and B) are independent, then P(A,B) = P(A)P(B) But if you don’t know P(A) and P(B), how do you check whether P(A,B) = P(A)P(B)? Why is independence important? But what needs to be independent? the errors, not the observations!

The bugaboo of spatial autocorrelation One of the most misapplied statements in ecology: “… In such a case, because the value at any one locality can be at least partly predicted by the values at neighboring points, these values are not stochastically independent from one another.” Legendre, P. 1993. Spatial autocorrelation: trouble or new paradigm. Ecology 74: 1659-1673 But does spatial autocorrelation in observed values necessarily imply lack of independence of the residuals?

What needs to be independent? The errors, not the observations! If your observations are spatially autocorrelated because they share similar values of their independent variables, this does not necessarily violate the assumption that the errors are independent…

Spatial autocorrelation of seedling density in a New Zealand temperate rainforest And… Spatial autocorrelation in the residuals of an inverse model to predict seedling density as a function of adult tree distribution

Consequences of spatial autocorrelation • What are the statistical consequences of spatial autocorrelation? • To a frequentist, the consequences are quite serious: inflation of degrees of freedom for test statistics • To a likelihoodist, the issue is simply one of identifying any bias in parameter estimation:as long as there are no demons involved, the bias is generally restricted to an underestimate of variance terms

Dealing with Autocorrelation • Frequentists: • A plethora of gyrations – quasi-likelihood, variance inflation factors, Mantel tests, and a variety of adjustments of degrees of freedom • Likelihoodists: • Recognize that the variance is under-estimated and move on • Model the spatial autocorrelation in the error term explicitly

Collinearity in your data and parameter tradeoffs in your models • Collinearity is probably just as common as autocorrelation, and just as often misinterpreted by reviewers!How much scatter do you need to separate the effects of two different independent variables? • Identifying collinearity is easy, but determining whether it is a problem generally depends on examining the model-fitting process…

Covariance and tradeoffs among model parameters • Identifying parameter tradeoffs • Invert the Hessian to get the parameter variance/covariance matrix • Examine the likelihood surface • Parameter tradeoffs • Structural (anytime there are multiplicative terms in your model, you should pay attention…) • Empirical (whenever there is very strong collinearity in a set of independent variables data, there are likely to be tradeoffs and covariance among parameters using those variables…)

Lecture 6 Your data and models are never perfect… Making choices in research design and analysis that you can defend

Lecture 6 Your data and models are never perfect… Making choices in research design and analysis that you can defend

Presentation Transcript

GIS Data Models III

CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysis Silke Ais

Research Process, Research Design and Questionnaires

Introduction to Medical Decision Making and Decision Analysis

Missing Data: Analysis and Design

Chapter 2

Quantitative Data Analysis

Function-Oriented Software Design (lecture 5)

CSE 544 Data Models and Views

Data Converter Design Techniques

CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 5

Lecture series: Data analysis

Chapter 6 Research Design : An Overview

Overview

GOT DATA? Step-by-Step Guide to Making Data Work for You

Experimental Design and Analysis of Variance: Basic Design

Joint Models with Missing Data for Semi-Supervised Learning

Lecture 5 – Categorical Data and Survival Analyses

CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 15

Educational Research: Data analysis and interpretation – 2 Inferential statistics

Missing Data: Analysis and Design

fMRI Data Analysis