120 likes | 130 Views
Explore the two-phase life cycle model for integrated statistical microdata, combining register-based statistics and survey sampling. Covers measurement, representation, validity, sampling and measurement errors. Welcome to a new age of data integration!
E N D
A two-phase life-cycle model of integrated statistical micro data Li-Chun Zhang Statistics Norway lcz@ssb.no
Register-based statistics & early years of survey sampling (Source: UNECE 2007) 20?? N. Kiær (1895). The representative method. ISI Session, Bern. A. Jensen (ISI-committee, 1924): “When ISI discussed the matter twentytwo years ago, it was the question of the recognition of the method in principle that claimed most interest. Now it is otherwise. I think I may venture to say that nowadays there is hardly one statistician, who in principle will contest the legitimacy of the representative method. Nevertheless, I believe that the representative method is capable of being used to a much greater extent than now is the case.” J. Neyman (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. JRSS 97, 558-606.
Survey life cycle from a quality perspective (Groves et al., 2004, Survey Methodology, Figure 2.5) Measurement Representation Target Population Construct Coverage Error Validity Sampling frame Measurement Sampling Error Measurement Error Sample Nonresponse Error Response Respondents Processing Error Adjustment Error Edited Response Postsurvey Adjustments Survey Statistic
A two-phase life-cycle model • Secondary use • Combination of sources
Single-sourceprimary-phase statistical micro data Measurement (Variables) Representation (Objects) Target Concept Target Set Frame Validity Measurement Accessible Set Selection Measurement Response/ Registration Accessed Set Missing/ Redundancy Processing Observed/ Validated Set Editing Single-source Micro Data (Primary)
Integrated secondary-phase statistical micro data Unit vs. Object Measurement vs. Representation Missing Values vs. Coverage Measurement (Variables) Representation (Units) Base Unit No. 1 Target Concept Composite Unit No. 1 Target Population Composite Unit No. 1 Transformation (Object to Unit) Relevance Coverage m:1 m:1 Base Unit No. 2 Composite Unit No. 2 Composite Unit No. 2 Harmonization Data Linkage Mapping Identification Base Unit No. N Classification Composite Unit No. K Alignment Composite Unit No. M Compatibility Unit m:1 Statistical Units Adjustment Composite Unit No. 2 Composite Unit No. H Composite Unit No. 1 Integrated Micro Data (Secondary)
An illustration of register-based household data:Kongsvinger at the time point of census 2001
Representing unit error by allocation matrix (Equivalence on row permutation & sequential upper-triangular by definition)
The 20th Century = Survey Sampling The 21th Century = Data Integration Welcome to a new age!