340 likes | 858 Views
Sunflower. 62. 76. 84* Soybeans. 11. 41. 52* Sorghum. 13. 52* 41. Rye. 21. 54 ... Sunflower. 73. 93. Soybeans. 66. 85. Sorghum. 83. 74. Rye. 71. 80. Oats. 75. 89. Dry Beans. 89. 81. Cotton ...
E N D
Slide 1:Improved County Level Estimation of Crop Yield Using Model-Based Methodology With a Spatial Component
Michael E. Bellow, USDA/NASS
Slide 2:Outline
Background Simulation Methodology Results of Ten State Study Convergence Evaluation Summary
Slide 3:County Level Commodity Estimation
NASS program since 1917 Estimates used by private sector, academia, government Data from various sources used NASS County Estimates System developed to facilitate the estimation process
Slide 4:Available Data Sources
Voluntary response surveys of farm operators List frame control data (lists of known farming operations) Previous year official estimates Census of Agriculture data (NASS conducts Census every five years) Earth resources satellite data
Slide 5:County Crop Yield Estimation
Yield is ratio of crop production to harvested area (acres) Accurate estimation challenging due to - reliable administrative data seldom available - high year-to-year variability of yields (weather sensitive) - lack of adequate sample survey data
Slide 6:Desirable Features of a County Yield Estimation Method
Repeatability Accurate variance estimation Produce estimates for counties having no survey data
Slide 7:Ratio (R) Estimator
Traditional crop yield estimator used by NASS Computed as ratio between production and harvested area estimates (with minor adjustment) Can produce inconsistent yields due to fluctuations in harvested acreage No utilization of survey data from counties other than the one being estimated
Slide 8:Model-Based County Estimation Methods
Based on linear or non-linear models relating true yields to survey reported values Generally fit using an iterative algorithm Convergence not always guaranteed Estimates can be adjusted for consistency with published state figures
Slide 9:Stasny-Goel (SG) Method
Developed at Ohio State University under cooperative agreement with NASS Assumes mixed effects model with farm size group as fixed effect and county as random effect Random effect assumed multivariate normal with covariance matrix reflecting spatial correlation among neighboring counties - corr(ti, tj ) = r if county i borders county j = 0 otherwise EM algorithm used to fit model
Slide 10:Stasny-Goel Method (cont.)
Previous year county yields used to derive initial estimates of county and size group effects Processing continues until at least one of the following two conditions is satisfied relative group and log-likelihood distances fall below preset limits maximum allowable number of iterations reached County yield estimates computed as weighted averages of individual farm level estimates (weights derived from Census of Agriculture data)
Slide 11:Griffith (G) Method
Developed by Dr. Dan Griffith at Syracuse University under cooperative agreement with NASS Predicts yield values using published number of farms producing crop of interest Assumes autoregressive model Employs Box-Cox and Box-Tidwell transformations Spatial imputation routine can compute estimates for counties with missing survey data
Slide 12:Previous Research on Model-Based Methods
Stasny, Goel and Rumsey (1991) early version of SG method tested on Kansas wheat production data Stasny et al (1995) improved version of SG tested on Ohio corn yield data Crouse (2000) SG evaluated for Michigan corn and barley yield Griffith (2000) Griffith method tested on Michigan corn yield data Bellow (2004) SG and Griffith methods compared for North Dakota oats and barley yield (presented at FCSM Research Conference)
Slide 13:Ten-State Research Study
Compare performance of Stasny-Goel, Griffith and ratio methods for various crops in ten geographically dispersed states: NY, OH, MI, TN, MS, FL, ND, OK, CO, WA Criteria for comparison bias, variance, MSE, outlier properties, convergence percentage
Slide 14:States In Study Area
Slide 15:Post-Stratification Size Groups
NASS statewide survey data post-stratified by county and farm size based on COA data (two or three size groups defined) Percentages of Census farm acres by size group used as weights for SG algorithm Equal total land in farms criterion used to form groups
Slide 16:Data Sources For Research Study
2002-03 Quarterly Agricultural Survey 2001-03 County Estimates Survey 2001-02 official crop yield estimates (previous year data) 2002 Census of Agriculture (number of farms, land in farms)
Slide 17:Simulation Procedure
Multiple regression performed on survey reported yield vs. official county yields, weighted average neighbor yields, size group membership variables Artificial population of 10,000 simulated survey data sets used to compute true population parameter values 250 sample data sets selected at random from population
Slide 18:Simulation Procedure (cont.)
Morans I computed to test whether simulated data sets reflect spatial correlation of real survey data SG, G and R methods applied to each of the 250 sampled data sets Average simulated parameter values compared with corresponding population values for each estimation method
Slide 19:Measures of Estimator Performance
Absolute Bias - average absolute difference between simulated yield estimates and true (population) yield Variance sample variance of simulated yield estimates Mean Square Error average squared deviation between simulated estimates and true yield (SG program also computes analytic MSE) Lower (Upper) Tail Proximity average absolute difference between 5th (95th) percentile of simulated yield estimates and true yield
Slide 20:Pairwise Estimator Comparison for Absolute Bias (* - better method)
Slide 21:Pairwise Estimator Comparison for Variance
Slide 22:Pairwise Estimator Comparison for MSE
Slide 23:Pairwise Estimator Comparison for LTP
Slide 24:Pairwise Estimator Comparison for UTP
Slide 25:Additional Bias Evaluation
Wilcoxon Rank Sum Test compare median absolute error (over simulation runs) of SG vs. R, SG vs. G for each county Wilcoxon Signed Rank Test assess whether median error of SG, G, R is negative, positive or zero (two one-sided tests performed for each county)
Slide 26:Results of Rank Sum Tests on Absolute Bias
Slide 27:Summary of Signed Rank Test Results (All Crops Combined)
Slide 28:Percent of Counties With Average Underestimate Less Than 10% of True Yield (* - best method)
Slide 29:Convergence Issues
SG algorithm not guaranteed to converge within fixed limit on number of iterations Non-convergence associated with numerical instability conditions Yield estimates produced for non-convergent runs may be suspect Convergence generally most reliable for highly prevalent crops, least reliable for rare crops
Slide 30:Algorithm Convergence Percentage By Crop (Limit of 5000 Iterations)
Slide 31:Two Approaches to Dealing With SG Non-Convergence
SG(1) - use estimate generated at final allowable iteration (N0) SG(2) - keep track of which iteration (i*) maximized the log-likelihood - if i* < N0 , rerun algorithm to i* and use that estimate - if i* = N0 , resume processing at iteration (N0+1) and continue until either - o convergence occurs (use that estimate) OR o log-likelihood decreases from one iteration to next (use estimate at next-to-last iteration)
Slide 32:Non-Convergence Study
Does SG(1) or SG(2) outperform ratio estimator in cases where SG failed to converge? Six cases with high non-convergence percentage selected for comparison of SG(1), SG(2) and R - 2002 CO barley (37 simulation runs) - 2002 MS soybeans (105) - 2002 NY winter wheat (39) - 2002 ND dry beans (38) - 2002 OH oats (50) - 2003 OK rye (59)
Slide 33:Combined Pairwise Estimator Comparison forNon-Convergence Test Cases
Slide 34:Summary
SG yield estimation method outperforms R in all efficiency categories and G in most categories (G outperforms R) Convergence problems can be alleviated using enhanced SG approach SG method recommended for integration into NASS County Estimates System