140 likes | 286 Views
Imputation of agricultural production in South Africa’s Census 2002. Phuti Malebana July 2008 Statistics South Africa. Content. Background Current situation Imputation method Results Way forward. Background.
E N D
Imputation of agricultural production in South Africa’s Census 2002 Phuti Malebana July 2008 Statistics South Africa
Content • Background • Current situation • Imputation method • Results • Way forward
Background • Agricultural statistics is vital for the socio-economic conditions of the society • Among others, it is used by government for: - Gauge performance of the economy activity - Food security Mainly used for policy formulation • “The Food and Agricultural Organisation (FAO) (200-) defines data quality and quality of agricultural statistics as relevance, accuracy, timeliness, punctuality, accessibility, clarity, comparability, coherence, completeness, and sound metadata, and Paradata”
Background • Importance of agricultural statistics for the country at large and world wide • Data should be “suitable for use” • Respondents do not provide data item or not participate at all • Non-response may diminish the representativeness of the sample and thus lead to bias • Remedies to improve the quality of data
Current situation • Aggregated data • Disaggregated data • Failure to meet some of the “user needs”
Imputationmethod • One of the methods for improving data quality originated and have been developed in statistical agencies • Edit/ imputation model - allows filling-in missing values or replacing contradictory values
Imputation method • Yost et. Al. (2000) identify five categories of automated imputations: - Deterministic imputation - Model-based imputation - Deck imputation - Mixed imputation - Use of expert systems • Many systems make imputations based on a specified hierarchy of methods
Imputationmethod • Research done so far includes use of: - Historical data - Nearest neighbor method - Average method • Historical data • Nearest neighbor method
Imputation method • Average method - The mean values (R/ton, R/LSU) of the frequent (selected) products are derived using the 2002 raw data at provincial/MGD level - Using the reported VAT turnover (2002) of the non-responding enterprise, production values are derived as follows: Production value = (Turnover / R/ton or R/LSU)
Imputation method • Product determination • National, provincial and MGD product distribution (farmers, production volume and production income) • The high frequent the product is (in MGD), chances are that a non-responding enterprise may be farming on that product • This is supported by the SIC • Production values are derived using average method
Results • Distribution of the agric frame for census and surveys so far and common enterprises across the surveys • Correlation analysis between the Income and turnover • Ratio between Turnover, Income and Total Income • Movement of income, total income and turnover (% change across the surveys) • In overall, the above results are to determine the validity of the historical data to help in imputation process
Challenges • Two rules of thumb(Hartley(1962,1974)): - Frame errors due to omissions and duplication can yield greater errors in data than all other sources of errors combined - Edit/imputation error can yield greater errors in data than all other sources of errors combined Historical data • Nearest neighbor method - agriculture frame includes enterprises registered under accountants/bookkeepers
Way forward • Tested the average method using responded enterprises • Historical data will be our first priority, followed by the other two methods • Currently working on the imputation using the common products within a MGD • This imputation process will be implemented for 2007 census which is underway, to populate the production figure
Is there a need to Impute? Many thanks to you