1 / 20

Overview of Archiving of Microdata Session 4 United Nations Statistics Division Demographic Statistics Section

Overview of Archiving of Microdata Session 4 United Nations Statistics Division Demographic Statistics Section. Overview of Presentation. What are microdata? Why disseminate microdata? Data files for archiving Preparing the data sets Data security Tools for archiving of microdata

theo
Download Presentation

Overview of Archiving of Microdata Session 4 United Nations Statistics Division Demographic Statistics Section

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Archiving of Microdata Session 4 United Nations Statistics Division Demographic Statistics Section

  2. Overview of Presentation • What are microdata? • Why disseminate microdata? • Data files for archiving • Preparing the data sets • Data security • Tools for archiving of microdata • Risks of disseminating microdata

  3. What are microdata? Microdata: • are electronic data files containing the information about each unit of enumeration such as person, household, housing unit • are organized data files in which each line (or record) contains information about one unit of observation • contain information in the form of coded values • contain different types of variables-numeric, alphanumeric, discrete or continuous-obtained from direct responses or derived by imputation/calculation

  4. Why disseminate microdata? • Main reason is to support research by offeringflexibility • to define variables and modify categories in a way to meet the needs of researches • to generate more interest which facilitates wider use of census data • A closer relationship between data providers and users can improve the reliability and relevance of data

  5. Version of data files for archiving • Data procedures often create multiple versions of microdata files. These files; • are created during different stages of census operation • differ in the quality, content and number of records • range from raw microdata files to cleaned and edited files for public use

  6. What is sensitive in microdata? • In order to ensure data confidentiality, census data usually do not contain variables that are direct identifiers • Census data sets include variables that are indirect identifiers; • Detail geographic information • Detail information on professional status • Some variables in microdata sets can be sensitive due to the nature of the information contained in them • Information on income, ethnicity, religion, etc.

  7. Preparing the data set Acquisition • Microdata can be generated from various data sources: censuses, surveys and administrative registers • A clear acquisition policy that describes scope, source and mandate for the acquisition of microdata sets is necessary • NSO can play an important role by expanding the scope of the data archive to official sources such as line ministries

  8. Preparing the data set • Data file • Hierarchical/relational files are easier to analyzeand more efficient for data storage • The identification variables in all data files should provide a unique identifier • Unique identifiers to merge data files should be composed of numeric variables for more efficient sorting and filtering of records • A unique household identification should not be a compilation of geographic codes since these codes are highly identifying • All unnecessary or temporary variables from the data files should be removed

  9. Preparing the data set • Variables and codes • All variables are labeled (variable labels) and the codes for all categorical variables are labeled (value labels) • “Missing” codes should be standardized for all variables • “Not applicable” code should be distinct from other missing codes • If “errors” or “missing data” imputed, this should be indicated in the data set

  10. Preparing the data set • Verification operation • If a dataset is hierarchical, all records in the individual level files should have a corresponding household in the household-level file • The number of records in each file should be verified • Data from all sections of the questionnaire should be included in the dataset ===>setting up verification rules to check data sets

  11. Data security • Physical security • Controlling access to rooms where data are held • Logging the removal of and access to media or hard copy material in store rooms • Network security • Not storing confidential data on servers or computers connected to an external network • Firewall protection and security-related upgrades to avoid viruses and malicious code

  12. Data security • Security of computer systems and files • Locking computer systems with password and installing a firewall system • Implementing password protection of, and controlled access to, data files • Protecting servers by power surge protection systems through line-interactive uninterruptible power supply (UPS) systems • Imposing non-disclosure agreements for managers or users of confidential data

  13. Data security • Security of personal data • Anonymising or aggregating data • Separating data content according to security needs • Removing personal information from data files and storing them separately

  14. Tools for archiving microdata • International Household Network Survey (IHSN) • A network of international agencies coordinated by World Bank/PARIS21 • Develop tools, guidelines and training materials • Advocate compliance with good practices and international standards

  15. Tools for archiving microdata • Redatam based IMIS • Originally developed at CELADE to promote acess to census microdata • It is a database management tool that manages large volumes of census data • Aims to promote access to and analysis of census and other data for informed decision making for sectoral and local development policies and programmes

  16. Risks of disseminating microdata • Maintaining respondents’ trust: confidentiality protection is the key element of trust • Potential misuse and misunderstanding of data by users: there should be procedures to prevent misuse of microdata; good documentation and technical support to prevent misunderstanding of microdata • Exposure to criticism and contradiction: data quality may not be good enough for further dissemination; there may be inconsistency between research results based on microdata and published aggregated data

  17. Risks of disseminating microdata • Legal issues: it is crucial for data procedures to ensure there is a sound legal and ethnical base (as well as the technical and methodological tools) for protecting confidentiality • Costs: these will include not only the costs of creating and documenting microdata files, but the costs of creating access tools and safeguards, and of supporting and authorizing enquiries made by research community, training and support to new users of microdata files • Technical capacity: the files need to be well-documented and preserved; be reviewed to identify the risk of disclosure of individual information and the risk reduced using various techniques

  18. Microdata is archived: “to allow future users to retrieve, access, decipher, view, interpret, understand and experience documents, data and records in meaningful and valid ways” Jeff Rothernberg “ to create institutional memory for long term researches”

  19. THANK YOU …..

More Related