200 likes | 210 Views
This session provides an overview of archiving microdata, including the importance of disseminating microdata, data files for archiving, data security, and tools for archiving microdata. It also covers the risks associated with disseminating microdata.
E N D
Overview of Archiving of Microdata Session 4 United Nations Statistics Division Demographic Statistics Section
Overview of Presentation • What are microdata? • Why disseminate microdata? • Data files for archiving • Preparing the data sets • Data security • Tools for archiving of microdata • Risks of disseminating microdata
What are microdata? Microdata: • are electronic data files containing the information about each unit of enumeration such as person, household, housing unit • are organized data files in which each line (or record) contains information about one unit of observation • contain information in the form of coded values • contain different types of variables-numeric, alphanumeric, discrete or continuous-obtained from direct responses or derived by imputation/calculation
Why disseminate microdata? • Main reason is to support research by offeringflexibility • to define variables and modify categories in a way to meet the needs of researches • to generate more interest which facilitates wider use of census data • A closer relationship between data providers and users can improve the reliability and relevance of data
Version of data files for archiving • Data procedures often create multiple versions of microdata files. These files; • are created during different stages of census operation • differ in the quality, content and number of records • range from raw microdata files to cleaned and edited files for public use
What is sensitive in microdata? • In order to ensure data confidentiality, census data usually do not contain variables that are direct identifiers • Census data sets include variables that are indirect identifiers; • Detail geographic information • Detail information on professional status • Some variables in microdata sets can be sensitive due to the nature of the information contained in them • Information on income, ethnicity, religion, etc.
Preparing the data set Acquisition • Microdata can be generated from various data sources: censuses, surveys and administrative registers • A clear acquisition policy that describes scope, source and mandate for the acquisition of microdata sets is necessary • NSO can play an important role by expanding the scope of the data archive to official sources such as line ministries
Preparing the data set • Data file • Hierarchical/relational files are easier to analyzeand more efficient for data storage • The identification variables in all data files should provide a unique identifier • Unique identifiers to merge data files should be composed of numeric variables for more efficient sorting and filtering of records • A unique household identification should not be a compilation of geographic codes since these codes are highly identifying • All unnecessary or temporary variables from the data files should be removed
Preparing the data set • Variables and codes • All variables are labeled (variable labels) and the codes for all categorical variables are labeled (value labels) • “Missing” codes should be standardized for all variables • “Not applicable” code should be distinct from other missing codes • If “errors” or “missing data” imputed, this should be indicated in the data set
Preparing the data set • Verification operation • If a dataset is hierarchical, all records in the individual level files should have a corresponding household in the household-level file • The number of records in each file should be verified • Data from all sections of the questionnaire should be included in the dataset ===>setting up verification rules to check data sets
Data security • Physical security • Controlling access to rooms where data are held • Logging the removal of and access to media or hard copy material in store rooms • Network security • Not storing confidential data on servers or computers connected to an external network • Firewall protection and security-related upgrades to avoid viruses and malicious code
Data security • Security of computer systems and files • Locking computer systems with password and installing a firewall system • Implementing password protection of, and controlled access to, data files • Protecting servers by power surge protection systems through line-interactive uninterruptible power supply (UPS) systems • Imposing non-disclosure agreements for managers or users of confidential data
Data security • Security of personal data • Anonymising or aggregating data • Separating data content according to security needs • Removing personal information from data files and storing them separately
Tools for archiving microdata • International Household Network Survey (IHSN) • A network of international agencies coordinated by World Bank/PARIS21 • Develop tools, guidelines and training materials • Advocate compliance with good practices and international standards
Tools for archiving microdata • Redatam based IMIS • Originally developed at CELADE to promote acess to census microdata • It is a database management tool that manages large volumes of census data • Aims to promote access to and analysis of census and other data for informed decision making for sectoral and local development policies and programmes
Risks of disseminating microdata • Maintaining respondents’ trust: confidentiality protection is the key element of trust • Potential misuse and misunderstanding of data by users: there should be procedures to prevent misuse of microdata; good documentation and technical support to prevent misunderstanding of microdata • Exposure to criticism and contradiction: data quality may not be good enough for further dissemination; there may be inconsistency between research results based on microdata and published aggregated data
Risks of disseminating microdata • Legal issues: it is crucial for data procedures to ensure there is a sound legal and ethnical base (as well as the technical and methodological tools) for protecting confidentiality • Costs: these will include not only the costs of creating and documenting microdata files, but the costs of creating access tools and safeguards, and of supporting and authorizing enquiries made by research community, training and support to new users of microdata files • Technical capacity: the files need to be well-documented and preserved; be reviewed to identify the risk of disclosure of individual information and the risk reduced using various techniques
Microdata is archived: “to allow future users to retrieve, access, decipher, view, interpret, understand and experience documents, data and records in meaningful and valid ways” Jeff Rothernberg “ to create institutional memory for long term researches”