1 / 34

Improved County Level Estimation of Crop Yield Using Model-Based Methodology With a Spatial Component

Sunflower. 62. 76. 84* Soybeans. 11. 41. 52* Sorghum. 13. 52* 41. Rye. 21. 54 ... Sunflower. 73. 93. Soybeans. 66. 85. Sorghum. 83. 74. Rye. 71. 80. Oats. 75. 89. Dry Beans. 89. 81. Cotton ...

Download Presentation

Improved County Level Estimation of Crop Yield Using Model-Based Methodology With a Spatial Component

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    Slide 1:Improved County Level Estimation of Crop Yield Using Model-Based Methodology With a Spatial Component

    Michael E. Bellow, USDA/NASS

    Slide 2:Outline

    Background Simulation Methodology Results of Ten State Study Convergence Evaluation Summary

    Slide 3:County Level Commodity Estimation

    NASS program since 1917 Estimates used by private sector, academia, government Data from various sources used NASS County Estimates System developed to facilitate the estimation process

    Slide 4:Available Data Sources

    Voluntary response surveys of farm operators List frame control data (lists of known farming operations) Previous year official estimates Census of Agriculture data (NASS conducts Census every five years) Earth resources satellite data

    Slide 5:County Crop Yield Estimation

    Yield is ratio of crop production to harvested area (acres) Accurate estimation challenging due to - reliable administrative data seldom available - high year-to-year variability of yields (weather sensitive) - lack of adequate sample survey data

    Slide 6:Desirable Features of a County Yield Estimation Method

    Repeatability Accurate variance estimation Produce estimates for counties having no survey data

    Slide 7:Ratio (R) Estimator

    Traditional crop yield estimator used by NASS Computed as ratio between production and harvested area estimates (with minor adjustment) Can produce inconsistent yields due to fluctuations in harvested acreage No utilization of survey data from counties other than the one being estimated

    Slide 8:Model-Based County Estimation Methods

    Based on linear or non-linear models relating true yields to survey reported values Generally fit using an iterative algorithm Convergence not always guaranteed Estimates can be adjusted for consistency with published state figures

    Slide 9:Stasny-Goel (SG) Method

    Developed at Ohio State University under cooperative agreement with NASS Assumes mixed effects model with farm size group as fixed effect and county as random effect Random effect assumed multivariate normal with covariance matrix reflecting spatial correlation among neighboring counties - corr(ti, tj ) = r if county i borders county j = 0 otherwise EM algorithm used to fit model

    Slide 10:Stasny-Goel Method (cont.)

    Previous year county yields used to derive initial estimates of county and size group effects Processing continues until at least one of the following two conditions is satisfied – relative group and log-likelihood distances fall below preset limits maximum allowable number of iterations reached County yield estimates computed as weighted averages of individual farm level estimates (weights derived from Census of Agriculture data)

    Slide 11:Griffith (G) Method

    Developed by Dr. Dan Griffith at Syracuse University under cooperative agreement with NASS Predicts yield values using published number of farms producing crop of interest Assumes autoregressive model Employs Box-Cox and Box-Tidwell transformations Spatial imputation routine can compute estimates for counties with missing survey data

    Slide 12:Previous Research on Model-Based Methods

    Stasny, Goel and Rumsey (1991) – early version of SG method tested on Kansas wheat production data Stasny et al (1995) – improved version of SG tested on Ohio corn yield data Crouse (2000) – SG evaluated for Michigan corn and barley yield Griffith (2000) – Griffith method tested on Michigan corn yield data Bellow (2004) – SG and Griffith methods compared for North Dakota oats and barley yield (presented at FCSM Research Conference)

    Slide 13:Ten-State Research Study

    Compare performance of Stasny-Goel, Griffith and ratio methods for various crops in ten geographically dispersed states: NY, OH, MI, TN, MS, FL, ND, OK, CO, WA Criteria for comparison – bias, variance, MSE, outlier properties, convergence percentage

    Slide 14:States In Study Area

    Slide 15:Post-Stratification Size Groups

    NASS statewide survey data post-stratified by county and farm size based on COA data (two or three size groups defined) Percentages of Census farm acres by size group used as weights for SG algorithm Equal total land in farms criterion used to form groups

    Slide 16:Data Sources For Research Study

    2002-03 Quarterly Agricultural Survey 2001-03 County Estimates Survey 2001-02 official crop yield estimates (‘previous year’ data) 2002 Census of Agriculture (number of farms, land in farms)

    Slide 17:Simulation Procedure

    Multiple regression performed on survey reported yield vs. official county yields, weighted average neighbor yields, size group membership variables Artificial population of 10,000 simulated survey data sets used to compute ‘true’ population parameter values 250 sample data sets selected at random from population

    Slide 18:Simulation Procedure (cont.)

    Moran’s I computed to test whether simulated data sets reflect spatial correlation of real survey data SG, G and R methods applied to each of the 250 sampled data sets Average simulated parameter values compared with corresponding population values for each estimation method

    Slide 19:Measures of Estimator Performance

    Absolute Bias - average absolute difference between simulated yield estimates and true (population) yield Variance – sample variance of simulated yield estimates Mean Square Error – average squared deviation between simulated estimates and true yield (SG program also computes analytic MSE) Lower (Upper) Tail Proximity – average absolute difference between 5th (95th) percentile of simulated yield estimates and true yield

    Slide 20:Pairwise Estimator Comparison for Absolute Bias (* - better method)

    Slide 21:Pairwise Estimator Comparison for Variance

    Slide 22:Pairwise Estimator Comparison for MSE

    Slide 23:Pairwise Estimator Comparison for LTP

    Slide 24:Pairwise Estimator Comparison for UTP

    Slide 25:Additional Bias Evaluation

    Wilcoxon Rank Sum Test – compare median absolute error (over simulation runs) of SG vs. R, SG vs. G for each county Wilcoxon Signed Rank Test – assess whether median error of SG, G, R is negative, positive or zero (two one-sided tests performed for each county)

    Slide 26:Results of Rank Sum Tests on Absolute Bias

    Slide 27:Summary of Signed Rank Test Results (All Crops Combined)

    Slide 28:Percent of Counties With Average Underestimate Less Than 10% of True Yield (* - best method)

    Slide 29:Convergence Issues

    SG algorithm not guaranteed to converge within fixed limit on number of iterations Non-convergence associated with numerical instability conditions Yield estimates produced for non-convergent runs may be suspect Convergence generally most reliable for highly prevalent crops, least reliable for rare crops

    Slide 30:Algorithm Convergence Percentage By Crop (Limit of 5000 Iterations)

    Slide 31:Two Approaches to Dealing With SG Non-Convergence

    SG(1) - use estimate generated at final allowable iteration (N0) SG(2) - keep track of which iteration (i*) maximized the log-likelihood - if i* < N0 , rerun algorithm to i* and use that estimate - if i* = N0 , resume processing at iteration (N0+1) and continue until either - o convergence occurs (use that estimate) OR o log-likelihood decreases from one iteration to next (use estimate at next-to-last iteration)

    Slide 32:Non-Convergence Study

    Does SG(1) or SG(2) outperform ratio estimator in cases where SG failed to converge? Six cases with high non-convergence percentage selected for comparison of SG(1), SG(2) and R - 2002 CO barley (37 simulation runs) - 2002 MS soybeans (105) - 2002 NY winter wheat (39) - 2002 ND dry beans (38) - 2002 OH oats (50) - 2003 OK rye (59)

    Slide 33:Combined Pairwise Estimator Comparison for Non-Convergence Test Cases

    Slide 34:Summary

    SG yield estimation method outperforms R in all efficiency categories and G in most categories (G outperforms R) Convergence problems can be alleviated using enhanced SG approach SG method recommended for integration into NASS County Estimates System

More Related