1 / 18

Dutch Census Microdata Availability & Statistical Disclosure Control

Explore historical trends, register linkage, social statistical database, harmonization efforts, microdata availability, and statistical disclosure control of Dutch census data. Access microdata sets for research and learn about protecting data privacy.

dominguezt
Download Presentation

Dutch Census Microdata Availability & Statistical Disclosure Control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The availability of Dutch census microdata Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands Division Social and Spatial Statistics Department Support and Development Section Research and Development ESLE@CBS.NL Workshop on Communication and Dissemination of Census Results in Geneva 16 May 2008

  2. Contents • Historical introduction • Registers used for the virtual census • Micro linkage • Social Statistical Database • Publicity about Dutch censuses • Harmonisation • Microdata availability • Statistical Disclosure Control

  3. Historical introduction • Till 1899: Ministry of Home Affairs • 1899: 8th Census • 1971: 14th Census • Till 1995: more and more surveys • Last twelve years: • moving to a register-based statistical office • Reasons: • Unwillingness (non-response) • Reduction of response burden • Reduction of expenses

  4. Registers used for the virtual census • External registers (maintenance by register holders): • Population Register (PR),16 million recordsdemographic variables: sex, day of birth, marital status, • country of birth etc. • Fiscal administration (FIBASE),jobs,7.2 million recordsand pensions and life insurance benefits, 2.7 million records • Social Security administrations, 2 million records,auxiliary information integration process • Internal registers (maintenance by Statistics Netherlands): • Jobs file (employees),6.5 million records and • Self-employed persons,790 thousand recordsdates of job, branch of economic activity • General Business Register, 600.000 recordssize class, (economic) activity • Housing Register, about 7 million recordshousing variables

  5. Micro linkage • Linkage key:RegistersSocial security and Fiscal number (SoFi), unique since 26 November 2007: Citizen Service NumberSurveys Sex, date of birth, address (postal code and house number) • Linkage key replaced by RIN-person • Linkage strategyOptimizing number of matchesMinimizing number of mismatches and missed matches

  6. Social Statistical Database • Social Statistical Database (SSD): Set of integrated microdata files with coherent and detailed demographic and socio-economic data on persons, households, jobs and benefits • No remaining internal conflicting information • SSD-set: • Population Register (backbone) • Integrated jobs file • Integrated file of (social and other) benefits • Surveys, e.g. LFSCombining element:RIN-person

  7. Publicity about Dutch censuses • The Dutch Virtual Census of 2001 was a successful alternative for a traditional census • Tables: http://www.cbs.nl/en-GB/menu/themas/dossiers/volkstellingen/publicaties/2005-virtual-dutch-census-art.htm • Book: http://www.cbs.nl/en-GB/menu/themas/dossiers/volkstellingen/publicaties/2001-b57-e-pub.htm

  8. Harmonisation (1) • More information about the Dutch traditional Censuses (including those of 1960 and 1971): • http://www.volkstellingen.nl/en/ • For 1960 and 1971 the same variables as for 2001 • if not available: constructed based on existing variables in Census data • Variables not internationally harmonised (e.g. sex, age, marital status, household position, country of birth, economic status, household size and country of citizenship) • same classification and priority rules as for 2001

  9. Harmonisation (2) • Household size and country of citizenship: • missing for 1960 • Religious denomination (philosophy of life): • only for 1960 and 1971 • Place of residence one year prior to the census: • only for 2001 • International classifications • Branch of current economic activity: ISIC / NACE • Occupation: ISCO-COM • Level of educational attainment: ISCED

  10. Microdata availability • One percent samples for three years (1960, 1971 and 2001) • IPUMS (Integrated Public Use Microdata Series): • http://www.ipums.org/international/index.html • Weighting to population totals • Protecting according to rules for public use microdata files with Mu-ARGUS • Microdata sets for all three years available for research! • DANS (Data Archiving and Networked Services): • http://www.dans.knaw.nl/en/

  11. Statistical Disclosure Control (1) • Microdata under contract (MUC): • No direct identifiers • Rule against spontaneous recognition: each combination of an extremely identifying variable, a very identifying variable and an identifying variable should occur at least 100 times in the population • Extension of this rule: maximum level of detail of some variables (occupation, level of education, branch of economic activity) is determined by the most detailed direct regional variable • Each region that can be distinguished in the microdata should contain at least 10,000 inhabitants • No direct regional variables in panel data

  12. Statistical Disclosure Control (2) • Identifying variables • Direct (formal) identifiers • Name, address, citizen service number, … • Indirect identifiers, differentiated into • Extremely identifying (E) • Very identifying (V) • Identifying (I) I V E

  13. I V E Statistical Disclosure Control (3) • Examples of identifying variables • Extremely identifying: • Regional variables (residence, work, …) • Very identifying: • Sex, nationality • Extremely identifying variables • Identifying: • Age, occupation, education • Very identifying variables

  14. Statistical Disclosure Control (4) • Public use microdata files: • Microdata must be at least one year old • No direct identifiers or direct regional variables • Only 1 kind of indirect regional variables. Values of indirect regional variables sufficiently scattered. Each area should contain at least 200,000 persons in the target population and should consist of municipalities from at least six of the twelve provinces. No dominating municipality in any area. • At most 15 indirect identifiers • No sensitive variables

  15. Statistical Disclosure Control (5) • Public use microdata files (continued): • Sampling weights should not provide additional identifying information • Rule against spontaneous recognition: at least 200,000 individuals in the population for each category of an identifying variable • Another rule against spontaneous recognition: at least 1000 individuals in the population for each category in the crossing of two identifying variables • At least 5 households per combination of categories of household variables • Records should be in random order

  16. Statistical Disclosure Control (6) • Microdata for remote analyses • Remote execution: • Scripts are sent (on line) to Statistics Netherlands and applied to the microdata; SDC is applied before returning the results • (Compare with on-site microdata) • Remote access: • On-line access to confidentialized microdata sets • (Compare with microdata under contract or on-site)

  17. Thank you for your attention! Time for questions and discussion

More Related