190 likes | 268 Views
Review and consultation: Next steps in supporting data on ethnicity. DAMES workshop on ‘Data on ethnicity in social survey research’, 28 th January 2010, University of Stirling. Some preliminary comments: E-Social Science Challenges/principles Ethnicity research agendas
E N D
Review and consultation: Next steps in supporting data on ethnicity DAMES workshop on ‘Data on ethnicity in social survey research’, 28th January 2010, University of Stirling
Some preliminary comments: • E-Social Science • Challenges/principles • Ethnicity research agendas • Further comments/discussions/questions
i) What makes this ‘e-Social Science’? Attention to data management in context of.. • Standards setting • Metadata • Portal framework Liferay portal to various DAMES resources iRODS system for ‘GE*DE’ specialist data Controlled data access under security limits • Use of workflows
‘Data Management’ ‘the tasks associated with linking related data resources, with coding and re-coding data in a consistent manner, and with accessing related data resources and combining them within the process of analysis’[…the DAMES Node..] • Usually performed by social scientists (post-release) • Most overt in quantitative survey data analysis • Usually a substantial component of the work process • Here we differentiate from archiving / controlling data itself
‘Data Management though e-Social Science’ • DAMES – www.dames.org.uk • ESRC Node funded 2008-2011 • Aim: Useful social science provisions • Specialist data topics – occupations; education qualifications; ethnicity; social care; health • Mainstream packages and accessible resources • Engage with existing provisions (e.g. ESDS; CESSDA) • Programme of case studies and provisions – more later
‘The significance of data management for social survey research’ • Data management is a major component of the social survey research workload • Pre-release manipulations performed by distributors / archivists • Coding measures into standard categories; Dealing with missing records • Post-release manipulations performed by researchers • Re-coding measures into simple categories • All serious researchers perform extended post-release management (and have the scars to show for it) • We do have existing tools, facilities and expert experience to help us…but we don’t make a good job of using them efficiently or consistently • So the ‘significance’ of DM is about how much better research might be if we did things more effectively…
E.g. of GEODE: Organising and distributing specialist data resources (on occupations)
Challenges/principles Data manipulation skills and inertia • I would speculate that around 80% of applications using key variables don’t consult literature and evaluate alternative measures, but choose the first convenient and/or accessible variable in the dataset • Data supply decisions (‘what is on the archive version’) are critical • Much of the explanation lies with lack of confidence in data manipulation / linking data • Too many under-used resources – cf. www.esds.ac.uk
Software issues • Stata seems to be the superior package for secondary survey data analysis: • Advanced data management and data analysis functionality • Supports easy evaluation of alternative measures (e.g. est store) • Culture of transparency of programming/data manipulation • Problems… • Not available to all users • Not easily incorporated in generic services
Variables and functional form Functional form = the way in which measures are arithmetically incorporated in quantitative analysis • With occupations, education, ethnicity, and elsewhere, we tend to be too willing to make simplifying categorisations • Multiple categorisations are possible • As are scaling approaches – better suited for complex analytical procedures
Good habits: Keep clear records of DM activities Reproducible (for self) Replicable (for all) Paper trail for whole lifecycle Cf. Dale 2006; Freese 2007 • In survey research, this means using clearly annotated syntax files (e.g. SPSS/Stata) Syntax Examples: www.longitudinal.stir.ac.uk
Principle: Use existing standards and previous research • Variable operationalisations Use recognised recodes / standard classifications • NSI harmonisation standards (e.g. ONS) • Cross-national standards [Hoffmeyer-Zlotnick & Wolf 2003; Harkness et al. 2005; Jowell et al. 2007] • Research reviews [e.g. Shaw et al. 2007] • Common v’s best practices (e.g. dichotomisations) Use reproducible recodes / classifications (paper trail) • Other data file manipulations • Missing data treatments • Matching data files (finding the right data)
Principle: Do something, not nothing • We currently put much more effort into data collection and data analysis, and neglect data manipulation • Survey research – the influence of ‘what was on the archive version’ …In my experience, a common reason why people didn’t do more DM was because they were frightened to…
Principle: Support linking data Complex data (complex research) is distributed across different files. In surveys, use key linking variables for... • One-to-one matching SPSS: match files /file=“file1.sav” /file=“file2.sav” /by=pid. Stata: merge pid using file2.dta • One-to-many matching (‘table distribution’) SPSS: match files /file=“file1.sav” /table=“file2.sav” /by=pid . Stata: merge pid using file2.dta • Many-to-one matching (‘aggregation’) SPSS: aggregate outfile=“file3.sav” /meaninc=mean(income) /break=pid. Stata: collapse (mean) meaninc=income, by(pid) • Many-to-Many matches • Related cases matching
Challenges.. Agreeing about variable constructions • Unresolved debates about optimal measures and variables • Esp. in comparative research such as across time, between countries In DAMES, we have particular interests in comparability for: • Longitudinal comparability (http://www.longitudinal.stir.ac.uk/variables/) • Scaling / scoring categories to achieve ‘meaning equivalence’ or ‘specific measures’
Challenges.. Incentivising documentation / replicability • There is little to press researchers to better document DM, but much to press them not to • Make DM and its documentation easier? • Reward documentation (e.g. citations)?
iii) Ethnicity research agendas Our impression • More data on more referents • Controlled access to data • Increasing recognition of intergenerational change • Mixed identities • Other views…?