1 / 12

Selective data editing Development & implementation

This article discusses the development and implementation of selective data editing at SCB (Statistics Sweden). It explores the potential gains of this method and the use of a common tool. The results from case studies are presented, including the SUSPICION and POTENTIAL IMPACT measures, local and global scoring, and the process of implementing Selekt.

flucero
Download Presentation

Selective data editing Development & implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Selective data editingDevelopment & implementation Q 2010 Helsinki Jörgen Svensson Process Owner Statistics Sweden (SCB)

  2. Standardizationat SCB • Decentralized production • Development of CBM:s • Editing costly, 33% of budgets • Data collection departments, 2006 • Standardization – the Lotta project, in 2006 2

  3. Nine case studies • Purpose of the project: • Try using selective data editing • What is the potential gain using the method? • Would it be possible to develop and use a common tool?

  4. Some results from case studies

  5. SUSPICION • SUSP(j, k) = Suspicion of variable j for unit k • SUSP(j, k) = 0 if variable value falls within acceptance interval • SUSP(j, k) → 1 as value deviates from acceptance limit • 0 ≤ SUSP(j,k) ≤ 1

  6. POTENTIAL IMPACT • POTIMP = Potential impact • POTIMP is weighted absolute difference between observed and predicted value : • POTIMP(j ,k,d) = for variable j, unit k in domain d wk is sampling weight, k(d) is domain indicator • SELEKT supports several ways to establish predicted value: from time series data and from cross sectional analysis within homogenous groups of units

  7. Flagging suspected errors log(Potentialimpact) Flagged log(Suspicion) 20

  8. LOCAL SCORE Local (item) score LScore (j,k,d): LScore (j,k,d) = SUSP(j,k)*|POTIMP(j,k,d)|*Cello(j,d) Cello(j,d) is inversely proportional to the standard error based on previous data

  9. GLOBAL SCORE • Global (unit) score GScore(k) is obtained by aggregation of local scores • LScore (k, j, d) → LScore (k , j) → GScore(k) • → = Summation , Euclidian Summation or Maximum • Only those units with GScore larger than a pre-decided threshold are followed up

  10. SELEKT, EDIT and process data 10

  11. Implementation of selekt So far three surveys: • Business activity indicators • Wage & salary structures in the private sector • Commodity flow survey 11

  12. Documentation A General Methodology for Selective Data Editing jorgen.svensson@scb.se anders.norberg@scb.se 12

More Related