140 likes | 287 Views
Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality. Farm Structure Survey: Considerations on the Release of a European Microdata. L. Franconi D. Ichim L. Corallo. Istituto Nazionale di Statistica ( ISTAT ) ITALIA. Tarragona, Spain, 26-28 October 2011. Summary.
E N D
Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Farm Structure Survey: Considerations on the Release of a European Microdata L. Franconi D. Ichim L. Corallo Istituto Nazionale di Statistica (ISTAT) ITALIA Tarragona, Spain, 26-28 October 2011
Summary • Description of the European Farm Structure • Survey (FSS) • Disclosure Scenarios and Risk Analysis • Disclosure Limitation Procedure • Information Loss Assessment • Conclusions
Objective To explore,analyze and make recommendations on the release of European FSS MFR Italian FSS 2005 CASE STUDY Dutch FSS 2007
Description of the European FSS • Survey provides information on: • a) utilised agricultural area (UAA) • b) livestock unit • c) SGMtotal standard gross margin (ESU) • d) geographical location (NUTS) • e) farming type • Regional character and sparsity of the data • The survey unit agricultural holdings • The target population agricultural holdings • Member States: • a) census at least each 10 years • b)BE,LU,NL,FI,SE census each survey round • c) UK,NO sample survey and census • Response rate >90% D
SGM and farming type coefficients Farming Type total SGM partial SGM agricultural quantities on a three years average prices available with one year delay The dominating activity partial SGM = more than 66% total SGM 70 farming type sudvision Classification farms 50 the particular type 17 principal type 9 general type
Analysis of the temporal detail Relative variations of the mean number of holdings with respect to y2000 at NUTS2 level stability of the phenomenon
Analysis of the geographical detail Percentage of large holdings at NUTS0 level in each wave of FSS. i) it’s difficult to analyse the phenomenon in a single MS ii) «small» countries do not have many large holdings
Disclosure scenario and risk analysis • Scenario Spontaneous Identification • categorical structural variables • Area status (A05) – 3 categories • SGM region code (A07) – NUTS2 - 21 categories for Italy • Holder-Sex (L011) – 3 categories • Age group (L012) – 7 categories
External information VISIBLE RE-IDENTIFICATION
SGM and Farming Type release SGM as it is in the original data TWO STRATEGIES recalculation SGM based on the recoded and perturbed variables (agricultural quantitites) SUPPRESSION of some identification variables AGGREGATION of some categorical variables PERTURBATION of some numerical variables
Suppression and aggregation A05 might be considered by some MS a significant data utility loss. • variables with high percentage of zero values (missing phenomenon) ADD UP • regional character and sparsity of the data
Perturbation continuous variables skew distribution re-identification - microaggregation k=3 - preserve the weighted means - SGM region as blocking variable - retains characteristics of the data INDIVIDUAL RANKING
Information lossassesment • percentage variation of the means/variances • the skew distributions as a consequence of the sparsity • Member States decide: IR at NUTS3 or NUTS2 level
EACH FARM IS VISIBLE • REGIONAL CHARACTER AND SPARSITY OF THE DATA • GEOGRAPHICAL DETAIL = NUTS2 LEVEL • RELEASE SGM ORIGINAL AND FARMING TYPE • FARMING TYPE AT PARTICULAR LEVEL • VARIATION OF INDIVIDUAL RANKING Conclusion