160 likes | 294 Views
Access to microdata from the census: observations from the UK. Angela Dale Centre for Census and Survey Research, University of Manchester, UK. Access to census data. Traditionally access to census data has been through published tables
E N D
Access to microdata from the census: observations from the UK Angela Dale Centre for Census and Survey Research, University of Manchester, UK IASSIST 2003
Access to census data • Traditionally access to census data has been through published tables • Most statistical offices are now providing web-based access to the data • Emphasis on easy access, • democratisation of official statistics • facilitated by new technology • Very welcome development IASSIST 2003
UK census: tables • 2001 Census tables now being published • Most downloadable from web/ CD • Much easier access than 1991 • ONS play an important role in providing data to the public and private sectors • Much greater web-access to tables and results from government surveys IASSIST 2003
UK: Microdata samples 2001: much greater concern over level of detail released than 1991 • Much less individual detail, less geography will be available in 2001 • Concern over ability to match microdata to 100% published tables and reveal additional information IASSIST 2003
A changing world • Increasing power of web-based search engines and increasing growth of large databases • Difficulty in knowing what data will become available in the future • Increased public concern over privacy and confidentiality • All lead to growing concerns by NSIs over data release IASSIST 2003
Public use files versus research files • Microdata that are truly ‘public’ contain little detail in order to ensure confidentiality • the data have limited research value • Thus with we need to make a distinction between ‘public use‘ files and ‘research’ files • The former must be truly safe in all situations • The latter must be of value for research IASSIST 2003
Access to microdata for reseaerch • If we accept that research microdata files are not ‘public’ then we can explore the interaction between • Safe settings and • Safe data • we need to avoid focusing only on two extreme positions but look at the ground in between IASSIST 2003
Different ways of protecting confidentiality Very safe data and a minimum of safety in the setting (public use files) increased safety in the setting increased detail in the data Very safe setting and a minimum of safety in the data IASSIST 2003
Very safe data – minimal safety from setting • How do we judge safety? • ONS currently require that one cannot recognise oneself – or a neighbour – with sufficient confidence to act on it • What does this mean in practise? • How can one ensure this level of safety? • data are unlikely to be valuable for much research IASSIST 2003
Increasing safety of setting through licensing • Standard licences require users not to try to identify anyone – or claim to have done so • Registration for 1991 census microdata required all universities to take responsibility for staff and students • Monitoring of usage and publications • Interaction with users to promote culture of respect for data IASSIST 2003
Improving care of released data • What do we do with microdata files when we have finished the research? • There is no requirement (in the UK) to return files or destroy files • Old data files lying around are potentially very damaging • Is there an IT solution? • Self-destruction after a given length of time? IASSIST 2003
Using software to extract only the data that is needed • We typically download to our PC entire data files • Power of PCs and speed of download make this easy • But do we need to keep entire datafiles on our desktops? • NESSTAR allows extraction of subsets • We may need to download only that is needed IASSIST 2003
Using technology to provide virtual safe setting • Need to see IT as an opportunity not just a threat • Virtual safe setting using GRID and middleware for analysis in a virtual safe setting • Removes need to download data to PC • Only results are extracted – no data files lying around IASSIST 2003
Current examples of remote access to data in a safe setting • Luxembourg Income Study • Submission of analyses by email • Controls over what outputs can be requested • Structure of Earnings Survey • Held in Eurostat • access based on LIS but additional security • No tables released • Desai (2002) for more information IASSIST 2003
A physical safe setting- a last resort? • Physical safe setting will always have a role - but for access to REALLY detailed data • Watkins and Boyko (2002) show costs are very considerable, therefore less research per dollar • Disadvantages scholars without facilities nearby • Thus we need to maximise use of other methods before resorting to safe setting IASSIST 2003