Module B-4: Processing ICT survey data

TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data Module B-4: Processing ICT survey data Unctad Manual Chapter 7

Objectives After completing this module you will know how to do: • Data processing • Data weighting (grossing-up) • Data editing • Data analysis Contents of this module 4. Data processing and analysis 4.1 Data editing 4.2 Data weighting 4.3 Estimating ICT indicators

Page 82 B4.1. Data editing Data editing Editing! What is editing? • Statistical information provided by businesses can contain errors such as • Wrong or missing data, • Incorrect classifications • Inconsistent or illogical responses. • Solutions to minimize such errors • Ex ante optimize the effectiveness of • data capture instruments • collection procedures. • Ex post application of robust data editing techniques

B4.1. Data editing Phases of data processing Editing! Raw data Quality controls during data collection and entry Treatment of internal errors and inconsistencies Micro-editing(input) Estimation of missing data Data editing Outlier analysis Macro-editing(output) Re-weighting procedures Editing of aggregates Clean data file

Page 82 B4.3. Estimating ICT indicators Internal inconsistencies and errors • Validity control of an individual data item requires: • To define a valid set of responses (in general, gender should be = 0 or 1, age should not be 110 years, etc; in ICT use of Internet by business should be 0 or 1) • To check questions against valid responses - Definition of rules based in relationships between questions (see Box 15 of the Manual: some logical tests) • Arithmetic checks during data entry or batch mode (totals, subtotals, frequencies)

Page 84 B4.3. Estimating ICT indicators Treatment of missing data • Final non-response (missing data) should be treated to avoid biased estimates. • Unit non-response treatment: Corrective weighting. • Sample-based methods (the original weights are modified with sample information) • Population-based method (the weights are modified with population information, the classical post stratification procedure)

B4.3. Estimating ICT indicators Page 151 Annexe 5 Treatment of missing data (cont.) • Final non-response (missing data) should be treated in order to avoid biased estimates. • Item non-response treatment: Imputation. • Deterministic imputation (a law). • Hot deck imputation (let’s do it now). • Cold deck imputation (using other information, models, econometrics…). • Mean or modal value imputation ( it is clear). • Historical imputation (long series).

B4.3. Estimating ICT indicators Page 86 Misclassified units • Two cases of misclassification • Non-eligibility unit erroneously included • This will reduce the effective sample size unless a reserve list is prepared • Eligible unit included in the wrong stratum or omitted from the frame altogether • The technical solution consists of recalculating sample weights (see Box 17)

B4.2. Data weighting Some simple weighting methods • The sample average in stratum h is defined as • The estimate for the total for stratum h can be obtained by multiplying the stratum average by the total number of businesses in the stratum (Nh)

B4.3. Estimating ICT indicators Some simple weighting methods (cont.) • The estimate for the total in the population is just or See boxes 18 and 19 pag 89

B4.3. Estimating ICT indicators Estimating proportions and ratios • A proportion: • Four different types of estimates are very usual • Simple random sampling of a non-stratified population • Stratified random sampling • With one or several strata exhaustively investigated • Ratio estimates with simple random sampling • Ratio estimates with stratified random sampling • ICT indicators are mainly proportions and ratios. A ratio :

B4.3. Estimating ICT indicators CASE 1: Simple random sampling of a non-stratified population • The indicator can be expressed as the sample proportion: • The standard error (SE) of the sample proportion is estimated by: • SE expression valid with a sampling fraction of 10% or less

B4.3. Estimating ICT indicators CASE 2: Stratified random sampling • An unbiased estimate ofp is: Where, L: the number of strata Nh : the population in stratum h (h=1, 2, ... L) nh : the sample size in stratum h (h=1, 2, ... L) • The estimate of the SE of: • See Annex 4 of the Manual for more details

B4.3. Estimating ICT indicators CASE 3: Ratio estimates with simple random sampling • The indicator to estimate is: • The natural estimate of ratio p is: • Finally, one approximation of the SE is: where is the sample average of nX observations, This is a reference outside the scope of our course

Module B-4: Processing ICT survey data