by Daniel A. Griffith Ashbel Smith Professor of Geospatial Information Sciences

Sugar Cane Production in Puerto Rico, 1958/59-1973/74: A Comparison of Four Model Specifications for Describing Small Heterogeneous Space-Time Datasets by Daniel A. Griffith Ashbel Smith Professor of Geospatial Information Sciences

ABSTRACT Researchers increasingly are accounting for heterogeneity in their empirical analyses. When data form a short time series—too short to utilize an ARIMA model—a random effect term can be employed to account for serial correlation. When data also are georeferenced, forming a space-time dataset, a random effect term can be included that is spatially structured in order to account for spatial autocorrelation, too. But space-time heterogeneity can be accounted for in various ways, including specifications involving recently developed spatial filtering methodology. This paper summarizes comparisons of four model specifications—simple pooled space-time; sequential, comparative statics; temporally varying coefficients with a spatially unstructured random effect; and, temporally varying coefficients with a spatially structured random effect—illustrating implementations with annual sugar cane production data for the 73 municipalities of Puerto Rico during 1958/59-1973/74. Covariates whose importance is assessed include elevation and distance from the primate city.

Panel data versus space-time data Panel data are a form of longitudinal data, and can be a cross-section (i.e., the spatial dimension) of individuals (e.g., farms) that are surveyed periodically over a given time horizon. With repeated observations of the same individuals, panel data permit a researcher to study the dynamics of change with short time series. A main advantage of panel data: controlling for unobserved heterogeneity (the fundamental complication of non-experimental data collection) BUT longitudinal data need not involve the same individuals: if a sample is not the same, observed changes also may result from sampling error

Spatial filtering A given random variable can be decomposed into a spatial component and an aspatial component: impulse-response function approach (based upon the autoregressive model), Getis approach (based on the K function), eigenfunction spatial filtering approach. The spatial component relates to spatial autocorrelation

High Peak district biomass index:ratio of remotely sensed data spectral bands B3 and B4 Spatially autocorrelated Geographically random

Defining spatial autocorrelation Auto: self Correlation: degree of relative correspondence Positive: similar values cluster together on a map Negative: dissimilar values Cluster together on a map

Spatial auto-correlation from r to MC

Constructing eigenfunctions for filtering spatial autocorrelation out of georeferenced variables: Moran Coefficient = (n/1T C1)x YT(I – 11T/n)C (I – 11T/n)Y/ YT(I – 11T/n)Y the eigenfunctions come from (I – 11T/n)C (I – 11T/n)

Eigenvectors for spatial filter construction The first eigenvector, say E1, is the set of real number numerical values that has the largest MC achievable by any set for the spatial arrangement defined by the geographic connectivity matrix C. The second eigenvector is the set of values that has the largest achievable MC by any set that is uncorrelated with E1. The third eigenvector is the third such set of values. And so on. This sequential construction of eigenvectors continues through En, the set of values that has the largest negative MC achievable by any set that is uncorrelated with the preceding (n-1) eigenvectors.

Useful citation

Random effects model is a random observation effect (differences among individual observational units) is a time-varying residual error (links to change over time) The composite error term is the sum of the two.

Random effects model: normally distributed intercept term • ~ N(0, ) and uncorrelated with covariates • supports inference beyond the nonrandom sample analyzed • simplest is where intercept is allowed to vary across areal units (repeated observations are individual time series) • The random effect variable is integrated out (with numerical methods) of the likelihood fcn • accounts for missing variables & within unit correlation (commonality across time periods)

Sugar cane production in Puerto Rico • Began in the 1530s • Experienced a sharp decline during 1580-1650 • Introduction of slave labor resulted in considerable expansion during 1765-1823 • By 1828, sugar exports were sizeable • Spanish monarchy discouraging expansion throughout much of the 1800s • United States took possession of the island in 1899, fully developing the long-demanded railroad on the island and channeling considerable investment into sugar cane production, achieving maximum expansion in the 1920 • Production peaked around 1950

Island-wide time series US intervention

1924 sugar cane railroad Finally started by the Spanish Crown, but aggressively completed by US investors

Covariates of sugar cane production distance from San Juan elevation covariate spatial filters

Model specifications I-A: initial I-B: with linear time trend II: with random effect

III: with spatial filter IV: with spatially structured random effect

Sugar cane production:1958/59-1973/74 1958/59 Scale Dark red: high Dark green: low 1963/64 1968/69 1973/74

Mixed binomial regression: time varying covariate coefficients, spatially unstructured and structured random effects

Spatial filters for space-time spatially structured random effects 1963/64 MC = 0.93, GR = 0.18 1958/59 MC = 0.77, GR = 0.30 1968/69 MC = 0.86, GR = 0.18 1973/74 MC = 0.94, GR = 0.22

(normally distributed) random intercept: areal unit specific across all years

Time series plots: intercept & covariate binomial regression coefficients intercept ● simple pooled model ■comparative static model ♦model with a spatially unstructured random effect ▲mixed model with spatially structured random effect mean elevation distance

Time series plots: covariate binomial regression coefficient standard errors distance mean elevation ● simple pooled model ■comparative static model ♦ model with a spatially unstructured random effect ▲ mixed model with spatially structured random effect

Residual serial correlation The random effects estimator approximates the degree of serial correlation (or its importance in the model), and hence allows the computation of corrected estimates. The 73 residual Durbin-Watson statistics have a range of (0.140, 2.513), with a mean of 0.836 and a standard deviation of 0.546. Determining significance here is complicated because of small T, inclusion of a random effects term, and variable SF eigenvecvtor #s

Graphical portrayal of DWs GLM residuals (heuristic using 4 dfs lost) positive serial correlation undecided 0 – 0.74 1.93 – 2.08 3.26 – 4 0.74 – 1.93 2.07 – 3.26

Summary of results

STAR-binomial specification space time space-time

Pseud- & quasi-likelihood estimation

Extra binomial variation remains ● pineapple production ■milk production ♦ sugar cane production ▲ tobacco production

implications • spatial autocorrelation appears to be a source of part of the overdispersion • random effects (e.g., missing covariates) appear to be a source of part of the overdispersion • land use competition may be a source of part of the overdispersion • spatial filters for mean elevation and distance have six eigenvectors in common; of these, one is shared with most of the annual comparative static spatial filters, and two with most of the spatially structured random effect term spatial filters

the components of spatial autocorrelation in sugar cane production vary over time • a spatially unstructured random effect term that seeks to account for serial correlation in multiple short time series can better highlight latent spatial autocorrelation • a spatial filter can effectively structure a random effect term • failure to include a spatially structured random effect term can result in biased parameter estimates (largely because of the nonlinear nature of the model specification) • spatial and temporal autocorrelation interact in a complex way

THE END

by Daniel A. Griffith Ashbel Smith Professor of Geospatial Information Sciences