210 likes | 227 Views
Copenhagen, August 12, 2010. The need for breadth and depth Compatibility between Demand and Supply for research data. Torben Tranæs R OCKWOOL F ONDENS F ORSKNINGSENHED. Introduction. What is the demand for data when you do research; broad applied as well as academic?
E N D
Copenhagen, August 12, 2010 The need for breadth and depth Compatibility between Demand and Supply for research data Torben Tranæs ROCKWOOL FONDENS FORSKNINGSENHED
Introduction • What is the demand for data when you do research; broad applied as well as academic? • And, how do these demands match up with the way national statistical agencies are operating? • Two main issues/problems: 1. the ways of data access. 2. the existing variables
Preconditions • The main operation run by a national statistical agency is not about research – it is about description and documentation • And also, the way data access has been considered – at least in Denmark – has mainly accommodated other users than academic researchers • My main focus will be the independent academic research
My talk: • The research process and data access • The need (demand) for and supply of data, and mismatch problem • Case: crime and the labor market • Conclusion
A generic research process: • The researcher formulates a hypothesis • A data set with the relevant variables is acquired • Hypothesis is tested • … and e.g., rejected! • Inspired by data, the theories are being revised/new or revised hypotheses are being formulated/the controlling environment is reconsidered • Data is arranged with (some) new variables • The new/revised hypothesis is being tested, etc.
This way of working would be very cumbersome with the way DsT use to think of data access • According to that model a data set was made available for a given project only, and for a fixed period – all of it being decided up front • Fortunately, the practice was different
Demand and Supply for Data; and possible mismatch • The demand (from the research community): • Variables defined based on social science theory • High precision, conceptually • Panel data with: 1) a long time dimension, and 2) a big big number of relevant and irrelevant variables, to get exogenous variation and instruments, e.g., by constructing discontinuities (if they don’t exist naturally) • The data supply • Administratively defined variables • High precision in terms of low measurement errors • Somewhat short but ever increasing time dimension • An extremely rich set of information (many variables)
Mismatch problems: • Long run: The set of existing variables does not co-inside with the set of warranted variables • Short run: The set of existing relevant data does not co-inside with the set of accessible data
Case: Does unemployment increase after a prison sentence? • What is the key variable? • Is it • ‘unemployment’, or • the fraction of people without a job? • In Denmark we have full-population information on the latter, not the former. We know who receive unemployment benefits, but that’s not necessarily the same as being ‘unemployed’ • Being ‘unemployed’ means that you are employable, available and active searching for jobs
Thus, three different measures of difficulties at the labor market: • People without job • Registered unemployment • (Employable) Jobless individuals who search and are available for employment • There exists information on 1. and 2., but on 3. only for small sub-samples of the population – and that is not enough when studying crime and former inmates
As we shall see below; it makes a big difference what definition is used
Fraction of ex-convicted that are on public support after prison relative to fraction in population age 15-59
Wage earnings before and after having served a long prison sentenceEarnings relative to unskilled, same age, same year
Summary of the case: • Dramatic different conclusions depending on which measure of unemployment is used • The richness of data reveals new stylized facts, e.g.: • the deroute begins before crime • That implies extra research questions in order to answer the question: • What is it that triggers the deroute? • School/labor market problems or family problems, substance abuse, beginning mental illness, etc. • But these questions cannot be pursued right away given the (existing) DSt policy.
Conclusion Two main problems/challenges: • A policy for data access that is not very compatible with the research process • Mismatch between the needed variables and the existing variables
What has been fixed lately or is being fixed as we speak? • In the near future it will be possible to operate big multi-project data sets with practical no ending date • What is not being resolved - remaining problems from the researcher’s point of view • In some major areas: Either the variables are somewhat wrong compared to the hypothesis in question, or • they are very expensive and still only possible to get in small samples