350 likes | 449 Views
Sugar Cane Production in Puerto Rico, 1958/59-1973/74: A Comparison of Four Model Specifications for Describing Small Heterogeneous Space-Time Datasets. by Daniel A. Griffith Ashbel Smith Professor of Geospatial Information Sciences. ABSTRACT.
E N D
Sugar Cane Production in Puerto Rico, 1958/59-1973/74: A Comparison of Four Model Specifications for Describing Small Heterogeneous Space-Time Datasets by Daniel A. Griffith Ashbel Smith Professor of Geospatial Information Sciences
ABSTRACT Researchers increasingly are accounting for heterogeneity in their empirical analyses. When data form a short time series—too short to utilize an ARIMA model—a random effect term can be employed to account for serial correlation. When data also are georeferenced, forming a space-time dataset, a random effect term can be included that is spatially structured in order to account for spatial autocorrelation, too. But space-time heterogeneity can be accounted for in various ways, including specifications involving recently developed spatial filtering methodology. This paper summarizes comparisons of four model specifications—simple pooled space-time; sequential, comparative statics; temporally varying coefficients with a spatially unstructured random effect; and, temporally varying coefficients with a spatially structured random effect—illustrating implementations with annual sugar cane production data for the 73 municipalities of Puerto Rico during 1958/59-1973/74. Covariates whose importance is assessed include elevation and distance from the primate city.
Panel data versus space-time data Panel data are a form of longitudinal data, and can be a cross-section (i.e., the spatial dimension) of individuals (e.g., farms) that are surveyed periodically over a given time horizon. With repeated observations of the same individuals, panel data permit a researcher to study the dynamics of change with short time series. A main advantage of panel data: controlling for unobserved heterogeneity (the fundamental complication of non-experimental data collection) BUT longitudinal data need not involve the same individuals: if a sample is not the same, observed changes also may result from sampling error
Spatial filtering A given random variable can be decomposed into a spatial component and an aspatial component: impulse-response function approach (based upon the autoregressive model), Getis approach (based on the K function), eigenfunction spatial filtering approach. The spatial component relates to spatial autocorrelation
High Peak district biomass index:ratio of remotely sensed data spectral bands B3 and B4 Spatially autocorrelated Geographically random
Defining spatial autocorrelation Auto: self Correlation: degree of relative correspondence Positive: similar values cluster together on a map Negative: dissimilar values Cluster together on a map
Spatial auto-correlation from r to MC
Constructing eigenfunctions for filtering spatial autocorrelation out of georeferenced variables: Moran Coefficient = (n/1T C1)x YT(I – 11T/n)C (I – 11T/n)Y/ YT(I – 11T/n)Y the eigenfunctions come from (I – 11T/n)C (I – 11T/n)
Eigenvectors for spatial filter construction The first eigenvector, say E1, is the set of real number numerical values that has the largest MC achievable by any set for the spatial arrangement defined by the geographic connectivity matrix C. The second eigenvector is the set of values that has the largest achievable MC by any set that is uncorrelated with E1. The third eigenvector is the third such set of values. And so on. This sequential construction of eigenvectors continues through En, the set of values that has the largest negative MC achievable by any set that is uncorrelated with the preceding (n-1) eigenvectors.
Random effects model is a random observation effect (differences among individual observational units) is a time-varying residual error (links to change over time) The composite error term is the sum of the two.
Random effects model: normally distributed intercept term • ~ N(0, ) and uncorrelated with covariates • supports inference beyond the nonrandom sample analyzed • simplest is where intercept is allowed to vary across areal units (repeated observations are individual time series) • The random effect variable is integrated out (with numerical methods) of the likelihood fcn • accounts for missing variables & within unit correlation (commonality across time periods)
Sugar cane production in Puerto Rico • Began in the 1530s • Experienced a sharp decline during 1580-1650 • Introduction of slave labor resulted in considerable expansion during 1765-1823 • By 1828, sugar exports were sizeable • Spanish monarchy discouraging expansion throughout much of the 1800s • United States took possession of the island in 1899, fully developing the long-demanded railroad on the island and channeling considerable investment into sugar cane production, achieving maximum expansion in the 1920 • Production peaked around 1950
Island-wide time series US intervention
1924 sugar cane railroad Finally started by the Spanish Crown, but aggressively completed by US investors
Covariates of sugar cane production distance from San Juan elevation covariate spatial filters
Model specifications I-A: initial I-B: with linear time trend II: with random effect
III: with spatial filter IV: with spatially structured random effect
Sugar cane production:1958/59-1973/74 1958/59 Scale Dark red: high Dark green: low 1963/64 1968/69 1973/74
Mixed binomial regression: time varying covariate coefficients, spatially unstructured and structured random effects
Spatial filters for space-time spatially structured random effects 1963/64 MC = 0.93, GR = 0.18 1958/59 MC = 0.77, GR = 0.30 1968/69 MC = 0.86, GR = 0.18 1973/74 MC = 0.94, GR = 0.22
(normally distributed) random intercept: areal unit specific across all years
Time series plots: intercept & covariate binomial regression coefficients intercept ● simple pooled model ■comparative static model ♦model with a spatially unstructured random effect ▲mixed model with spatially structured random effect mean elevation distance
Time series plots: covariate binomial regression coefficient standard errors distance mean elevation ● simple pooled model ■comparative static model ♦ model with a spatially unstructured random effect ▲ mixed model with spatially structured random effect
Residual serial correlation The random effects estimator approximates the degree of serial correlation (or its importance in the model), and hence allows the computation of corrected estimates. The 73 residual Durbin-Watson statistics have a range of (0.140, 2.513), with a mean of 0.836 and a standard deviation of 0.546. Determining significance here is complicated because of small T, inclusion of a random effects term, and variable SF eigenvecvtor #s
Graphical portrayal of DWs GLM residuals (heuristic using 4 dfs lost) positive serial correlation undecided 0 – 0.74 1.93 – 2.08 3.26 – 4 0.74 – 1.93 2.07 – 3.26
STAR-binomial specification space time space-time
Extra binomial variation remains ● pineapple production ■milk production ♦ sugar cane production ▲ tobacco production
implications • spatial autocorrelation appears to be a source of part of the overdispersion • random effects (e.g., missing covariates) appear to be a source of part of the overdispersion • land use competition may be a source of part of the overdispersion • spatial filters for mean elevation and distance have six eigenvectors in common; of these, one is shared with most of the annual comparative static spatial filters, and two with most of the spatially structured random effect term spatial filters
the components of spatial autocorrelation in sugar cane production vary over time • a spatially unstructured random effect term that seeks to account for serial correlation in multiple short time series can better highlight latent spatial autocorrelation • a spatial filter can effectively structure a random effect term • failure to include a spatially structured random effect term can result in biased parameter estimates (largely because of the nonlinear nature of the model specification) • spatial and temporal autocorrelation interact in a complex way