290 likes | 701 Views
Backcasting. United Nations Statistics Division. Overview. Any change in classifications creates a break in time series, since they are suddenly based on differently formed categories Backcasting is a process to describe data collected before the “break” in terms of the new classification.
E N D
Backcasting United Nations Statistics Division
Overview • Any change in classifications creates a break in time series, since they are suddenly based on differently formed categories • Backcasting is a process to describe data collected before the “break” in terms of the new classification
Overview • There is no single “best method” • Factors influencing a decision include: • type of statistical series that requires backcasting (raw data, aggregates, indices, growth rates, ...) • statistical domain of the time series • availability of micro-data • availability of "dual coded" micro-data (i.e. businesses are classified according to both the old and the new classification) • length of the "dual coded" period • frequency of the existing time series • required level of detail of the backcast series • cost / resource considerations
Main methods • “Micro-data approach” (re-working of individual data) • “Macro-data approach” (proportional approach) • Hybrids thereof
Micro-data approach Consists of assigning a new activity code (= new classification) to all units in every period in the past (as far back as backcasting is desired) No other change is required Statistics are then compiled by standard aggregation Census vs. survey (weight adjustment issue)
Micro-data approach • Census • All in-scope unites are selected and therefore have a weight of one. • Each unit is therefore recoded and then the re-aggregation can take place. • Survey • The non-observed units in the population have influence on the outcome via sampling weights • Therefore all units under the population (both observed and non-observed) need to be coded • Re-aggregation of the sample units under the new classification can then occur.
Micro-data approach Requires detailed information from past periods (for all units to be recoded) More detailed than just the old code If information is available, results are more reliable than those from macro-approaches
Micro-data approach Issues: Resource intensive Need solutions if unit information is not available for a period (not collected, not responded) Nearest neighbor Back calculation of the elementary unit is made in the same way as made for the “closet unit”. Transition matrix approach Using conversion coefficient at the elementary level
Macro-data approach Also called “proportional method” This method calculates a ratio (“proportion”, “conversion coefficients”) in a fixed dual coding period that is then applied to all previous periods The ratios are calculated at the macro level Could be based on number of units (counts) or size variables such as turnover or employment Has a more approximate character
Macro-data approach In simple form, applies growth rates of former time series to the revised level for the whole historical period More sophisticated methods may use adjustments based on experts’ knowledge Example: mobile phones
Macro-data approach Assumes that the same set of coefficients applies to all periods This means it is assumed that the distribution of the variable of interest has not changed between the old and the new classification Applied to aggregates; does not consider micro-data Relatively simple and cheap to implement
Macro-data approach: Steps 1 – estimation of conversion coefficients Done for dual-coding period Longer/multiple periods help in overcoming “infant problems’ of the new classification and allow for correction of data Based on selection of specific variable 2 – calculation of aggregates using the conversion coefficients Weighted linear combination 3 – linking the different segments Old – overlap – new series Breaks caused by mainly by change in field of observation Simple factor or “wedging” 4 – final adjustment Seasonal etc.
Macro-data approach: Hypothetical example • Basics of conversion matrices • Makes use of a simple, artificial example Convert from A to B. a = 3 (codes 1A, 2A, 3A) b = 5 (codes 1B, 2B, 3B, 4B, 5B) N (Count) = 115
Conversion matrix A to B: counts Conversion is via linear combination Conversion coefficient from 1A to 1B
Conversion is via linear combinations … and the aggregate totals are the same:
Example – ISIC Rev3 to Rev.4 Conversion at the Section level • Denote turnover (y) of ISIC Rev.3 Section C, D & E and out-of-scope unit (Z) by • Denote turnover (y) of ISIC Rev.4 Section B, C, D & E and out-of-scope unit (Z) by
Conversion matrix Conversion coefficient from Rev3 Section C to Rev4 Section B
Turnover: Summary table • The turnover value of activities that is classified in • Old classification: Rev.3 Section C • New classification: Rev.4 Section B
Conversion matrix • Of the Rev.3 Section C activities, • 98.41% is reclassified to Rev.4 Section B • 1.05 % is reclassified to Rev.4 Section C, and so on • Rev.4 Section C activities is a combination of 1.05 % Rev.3 Section C, 96.09% Rev.3 Section E, and 0.16% of Rev.3 activities that does not belong to the Rev.3 industrial sector
Conversion via linear combination • Equations for converting total series from Rev.3 to Rev.4 are:
Comparison Micro-data approach better retains structural evolution of the economy Micro-data approach does not require choice of a special variable Macro-data approach reflects evolution based on fixed ratio for a fixed variable Seasonal patterns may be distorted Macro-data approach is more cost-efficient No consideration of micro-data necessary Assumptions underlying the macro-data approach become invalid over longer periods “Benchmark years” might help to measure the effect, if data is available
Other options Combinations of both approaches are possible Ratios for the macro-data approach could be calculated for shorter periods only Micro-data approach could be used for specific years and the macro-data approach for interpolation between these years E.g. based on availability of census data Many factors can influence the choice (see beginning) but data availability is a key practical factor