Selective data editing Development & implementation

Selective data editingDevelopment & implementation Q 2010 Helsinki Jörgen Svensson Process Owner Statistics Sweden (SCB)

Standardizationat SCB • Decentralized production • Development of CBM:s • Editing costly, 33% of budgets • Data collection departments, 2006 • Standardization – the Lotta project, in 2006 2

Nine case studies • Purpose of the project: • Try using selective data editing • What is the potential gain using the method? • Would it be possible to develop and use a common tool?

Some results from case studies

SUSPICION • SUSP(j, k) = Suspicion of variable j for unit k • SUSP(j, k) = 0 if variable value falls within acceptance interval • SUSP(j, k) → 1 as value deviates from acceptance limit • 0 ≤ SUSP(j,k) ≤ 1

POTENTIAL IMPACT • POTIMP = Potential impact • POTIMP is weighted absolute difference between observed and predicted value : • POTIMP(j ,k,d) = for variable j, unit k in domain d wk is sampling weight, k(d) is domain indicator • SELEKT supports several ways to establish predicted value: from time series data and from cross sectional analysis within homogenous groups of units

Flagging suspected errors log(Potentialimpact) Flagged log(Suspicion) 20

LOCAL SCORE Local (item) score LScore (j,k,d): LScore (j,k,d) = SUSP(j,k)*|POTIMP(j,k,d)|*Cello(j,d) Cello(j,d) is inversely proportional to the standard error based on previous data

GLOBAL SCORE • Global (unit) score GScore(k) is obtained by aggregation of local scores • LScore (k, j, d) → LScore (k , j) → GScore(k) • → = Summation , Euclidian Summation or Maximum • Only those units with GScore larger than a pre-decided threshold are followed up

SELEKT, EDIT and process data 10

Implementation of selekt So far three surveys: • Business activity indicators • Wage & salary structures in the private sector • Commodity flow survey 11

Documentation A General Methodology for Selective Data Editing jorgen.svensson@scb.se anders.norberg@scb.se 12

Selective data editing Development & implementation