1 / 24

Frank Linder (co-author Dominique van Roon) Statistics Netherlands

Deriving Educational Attainment by combining data from Administrative Sources and Sample Surveys Recent developments towards the 2011 Census. Frank Linder (co-author Dominique van Roon) Statistics Netherlands Conference Statistics Investment to the future 2 Prague, 14-15 September 2009.

egan
Download Presentation

Frank Linder (co-author Dominique van Roon) Statistics Netherlands

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deriving Educational Attainment by combining data from Administrative Sources and Sample SurveysRecent developments towards the 2011 Census Frank Linder (co-author Dominique van Roon) Statistics Netherlands Conference Statistics Investment to the future 2 Prague, 14-15 September 2009

  2. Contents • Importance of data on education • Data Sources education level - traditionally - new alternative • Innovation in micro-integration:new way of combining administrative sources and sample surveys • Social Statistical Database (SSD), Virtual Census • Educational attainment, new method explained - micro-integration steps - weighting strategy • Accuracy • Conclusions Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  3. Importance of data on education • Education key social indicator for government policy and socio-economic research • EU Lisbon Strategy 2000: “Education and training policies are central to the creation and transmission of knowledge and are determining factor in each society’s potential for innovation… Positive impact of education on employment, health, social inclusion and active citizenship has already been extensively shown” • Educational Attainment standard variable in Census Programme • Great demand for data on education level by researchers, background variable for their analyses • Educational Attainment central position in Social Statistical Database (SSD) Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  4. Data Sources education level, traditionally • Labour Force Survey (LFS), exclusive domain - complete education career until date of interview - educational attainment • Reliability small subpopulations problematic • In practice LFS-solution: - unified sample over a period of consecutive years => more observations, lower standard errors - assumption: stability of variable over the period • LFS still used as source for education Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  5. Data Sources education level, new alternative • New administrative education registers in last decade - new opportunities determination educational attainment - full coverage target population of register => more observations => more reliable estimates (in particular small populations!) • However ….. alternative still dependent on LFS ! - no coverage e.g. people prior to administrations (mostly older citizens), private education, studies abroad Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  6. Innovation in micro-integration • combining statistical information from administrative sources ánd sample surveys for the sake of one variable EXAMPLE Integration conventional way Integration new way (different variables) (one variable) Jobs register Education register 1 Education register k LFS LFS Education level Employeepopulation Education level target population Coherent information on education level of employees Education level Education level Education level Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  7. Social Statistical Database (SSD), Virtual Census • Integration Framework Social Statistics • - Micro-linkage and micro-integration of data on demographic and socio-economic issues • - Data sources: • ∙ administrative registers (primarily) • ∙ sample surveys (if no information in registers) • - Coherence, consistency, comprehensiveness, completeness, detailedness, 1 figure - 1 phenomenon • - Important base production of social statistics • Kind of information • - Demography, labour, social security, income, health care, security, housing etc….. • - Education level (previously LFS, now new method) • Key source Virtual Census 2001 and 2011 Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  8. Educational Attainment, new method, step 1Construction Education Archive • Collecting sources education data • Storage in Education Archive • Registers Sample Surveys Labour Force Survey’96.. Primary Education -ENR2010.. Secondary education -ENR’02.. -ERR’99.. -CREHE prelim Other registers -CWI’90.. -SFR’95.. -RSF’01.. Higher education -CREHE’83.. Education Archive (cumulative storage by annual addition) Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  9. Education Archive Extract from Education Archive Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  10. Educational Attainment, new method, step 2 Construction Educational Attainment File • Construct Education Attainment File (EAF) containinghighest attained education level of individuals at reference date • Selection from Education Archive (micro-integration) - Records representing education careers until reference date (micro-integr: derive and impute missing start and end dates) • Adjust to target population (micro-integration) - E.g. eliminate foreign students in Education Archive; supplement primary education by imputation from Population Register: PR 0-14) • Quality assessment sources (micro-integration) - Which sources to be used, which neglected (e.g. CWI)? • Assessing validity education levels at reference date - Decision rules: deterministic and stochastic (based on probabilities) • Determine highest valid level during education careerExample registers: SCED 43 43 53 60 60 LFS-sample: SCED 20 – 33 – 43 – 53 – 60 Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  11. Validity education level at reference date optional sheet • Deterministic decision rules • Examples: - Last record of a person in Education Archive: secondary education with certificate in 2003. Education level still valid at reference date 2006? Yes, because not found in register of Higher Education. • - Someone doctorate thesis in 2005. Gives highest attained education level period after, because we distinguish no higher level (trivial!) • Stochastic decision rules - Probability education level is still valid at reference date (application Survival Analysis, Life Tables method) • Example: last observed date D. Is level valid x years after D? Upper bound U: within period [D;D+U] still valid (95%) Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  12. Weighting Strategy (1),structure EAF • REGISTER • inflating • EAF september 2005: mixture of register and sample (LFS) records • - Coverage: 6.5 mln records (5.8 mln register and 0.675 mln sample) • - NL population: 16.3 mln people. • - Sample inflation to bridge gap of 9.8 mln people Education Attainment File, September 2005 Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  13. Weighting Strategy (2), representativeness • Younger people better represented in EAF (registers!)- Older people are underrepresented in EAF- make EAF representative of NL Population: calibration LFS-weights Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  14. Weighting Strategy (3), weighting model • weighting model, variables • - demographic (e.g. sex, age, country of origin,…) • - socio-economic (e.g. socio-econ. category, income,…) • - education CWI-register (proxy) • weighting model too abundant? • - pro: consistency with as many population margins • - con: fluctuation final weights, disrupting effect on accuracy, problematic in cells with few observations • - revision of weighting strategy is considered Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  15. Accuracy and dissemination • Education levels accurate enough for dissemination? • Measuring instrument needed for determination accuracy: variance estimator • Standard literature sampling theory mainly for sample only, less the case for combined register and sample data • - approximation formulas – only to be used for larger n. Problematic for smaller subpopulations in which we are particularly interested • - solution also applicable for smaller subpopulations: bootstrap resampling method for accuracy measurement developed by methodologists of Statistics Netherlands Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  16. Accuracy, small subpopulations • Young highly educated persons of Turkish origin, 2005 Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  17. Conclusions • Results new method deriving educational attainment are promising- Similarity with results traditional LFS-estimation at high aggregation level - Outperforms LFS for small populations. So more dissemination possibilities for small populations • Serious opportunity for innovation of Census of 2011 to produce educational attainment according to new method Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  18. RemarksQuestionsDiscussion For more details read our paper! Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  19. SSD-system, organizational units optional sheets • SSD-system: SSD-core (pivot) and SSD-satellites • Not fully presentation of SSD-satellites!! • Manageability SSD-system: split in smaller units • Core: demographic and socio-economic information relevant in almost any field • Educational attainment core-position: crucial in many social processes • Satellite: specific topics • Core and satellites consistent: 1-figure 1-phenomenon sparetime Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  20. Population Census in the Netherlands optional sheets • Until 1971 conventional: field enumeration • 2001: Virtual Census - Key source: Social Statistical Database (SSD) • - greater part from registers. - educational attainment from LFS (2000/2001) • Virtual Census results convincing => • build on these experiences for Census 2011 • 2011: Virtual Census - Task force in charge of working-out - Key source: much more comprehensive SSD - Educational attainment new method recommended (detailed table Census table programme) sparetime Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  21. Survival Analysis, Life Tables Method optional sheets sparetime The survival function S(t) = P[T ≥ t] gives the probability that the educational attainment has not changed within t years. The distribution was determined empirically on the basis of the LFS for a number of years. Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  22. CV-formulae, sample optional sheets • - N population size • - n sample size • - N(g) total population in cell g • - n(g) number of sample observations in cell g • - p=N(g)/N; q=1-p and f=n/N. • With n large enough, the variance of Ň (g) can be approximated as: var(Ň(g)) = N2pq(1-f)/n • With the assumption that the average sample fraction f is very small (e.g. LFS sample fraction is about 1 percent), and p is very small (i.e. relative small subpopulation) the coefficient of variation (CV) can be approximated as: • [q(1-f)/np]½ ≈ [1/n(g)]½. • So, a CV ≤ 20% implies n(g) ≥ 25 (threshold) sparetime Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  23. CV-formulae, mixture sample/register optional sheets • - N1(g) number of register observations in cell g • - N2(g) weighted number of sample observations in cell g • - N(g)= N1(g)+ N2(g) • - n2(g) is the (unweighted) number of sample observations in cell g. • For a coefficient of variation of not more than A it is required that • n2(g) ≥ (1/A)2. [N2(g) / (N(g)]2. • With a higher N1(g) the threshold decreases sparetime Conference Statistics Investment to the future 2, Prague 14-15 september 2009

  24. Silva & Skinner (1997) • Simulation study • Adding auxiliary variables in a regression model causes variance of regression estimator to drop initially, but by adding still more variables the variance will tend to increase from a certain point on. optional sheets sparetime Conference Statistics Investment to the future 2, Prague 14-15 september 2009

More Related