320 likes | 333 Views
This article provides a short overview of the current situation regarding the use of scanner data in the Swiss Consumer Price Index (CPI). It discusses the introduction of scanner data in CPI computation, the use of web-based software for price collection, the integration of new retailers, and the gradual approach in practical implementation. The advantages of using scanner data with traditional methods, such as increased data quality and reduced costs, are also highlighted.
E N D
Scanner data in the Swiss CPIAn alternative to price collection in the field Reto MüllerJoint UNECE/ILO meeting on Consumer Price Indices Geneva, 10-12 May 2010
Short overview on current situation (I) • First introduction of scanner data for regular CPI computation in July 2008 • Pilot project • Web-based software, specifically developed for price collection from the largest retail chain • Not suitable for scanner data price collection from other retailers • Traditional methodology • Food and Near-food only • Proof of feasibility / practicability in general
Short overview on current situation (II) • Introduction of new generic software tool in Dec. 2009 • Generic tool suitable for scanner data price collection from any retailer in Switzerland thanks to common interface definition • April 2010: Start price collection from second retailer (first with new software) • Traditional methodology / food and near-food only • Both retailers together 60-70% market share for (near) food • Until 2012: Integration of one more retailer about every 6 months
Gradual approach in practical implementation (I) • Due to the many methodical difficulties related to scanner data the FSO has chosen a step by step process: • traditional collection and calculation methods in the initial phase • first step: SD as improvement of the existing price collection system (new and better data source instead of new calculation method) • Second step: evaluation of alternative index formulas or different kinds of quality adjustments to follow in a second step (from 2010)
Calculation methodology Traditional AND scanner data price collection: • Elementary indices (Jevons): Non-weighted geometric mean of price relations • Higher aggregates (Lowe): Annual chaining/reweighting (HBS) • 11 regional and 3-5 distribution channel weightings Scanner data price collection only: • Average transaction price (sales/quantity) paid per item throughout Switzerland during the first 14 days of the month • Average transaction price is still calculated separately for each region if the item price is set regionally
Gradual approach in practical implementation (II) • The gradual integration of retail chains • The gradual integration of product groups: Start with food/near-food groups • Difficulties in quality adjustments for non-food groups • price-determining characteristics are more complex • very few information about item quality in POS scanner data • additional information needed • Item descriptions sometimes not clear (e.g. “CD1”, “CD2”, “CD3”….)
Gradual approach in practical implementation (III) • Software development in several stages • Prototype → generic tool Gradual proceeding conclusion: Allows the FSO to take immediate advantage of the most important benefits of scanner data without being exposed to any major risks
Advantages of scanner data using traditional methods (I) Increased data quality: • precise sampling method according to turnover • price collection during a period of two weeks instead of a single day • scanner data usually include transactions from every single outlet nationwide (full survey per item and survey period at no extra cost) • recording of all transactions including sales, promotions and other offers
Advantages of scanner data using traditional methods (II) Reduced costs • Traditional price collection in the field is mainly done by a private market research institute on behalf of FSO • Considerable reduction of time and effort (but transfer of remaining effort from stores to FSO) Smaller burden of workload for the retail chains • Traditional price collectors take up time of retail stores’ staff
Quality assurance / risk management • Quality assurance of the supplied data is difficult: • FSO has no influence on data collection • Regular checks are necessary: • SD are usually subject to intensive tests by retailers first • Test price collections • Quantitative and qualitative checks after importation • Checks before, during and after price collection • Dependence on retailers (SD supply) • Risk greatly reduced by indep. data supply by each retailer • Emergency plan
Collaboration with retail chains • 2006 survey among the ten largest retailers in Switzerland • All retailers were basically ready to make their data available (for free) • The FSO has adapted the Ordinance on the conduct of Federal Statistical surveys • Additional agreements with individual retail chains
Scanner data supply • Pre-aggregated data per item, period and price region. Datasupplied by retailers directly (fully automated process, but with some initial outlays) • Three different periods each month: First 7 days (used for sampling), 14 days (user for index computation) and 31 days (sampling) • Common interface definition for all retail chains: uniform format, but supplying retailer specific variables (e.g. more precise item descriptions) is possible and desirable • Separate data files for master data (item description etc.) and transaction data (quantities, sales volumes etc.)
New SD-software module • New Software was needed for scanner data price collection, NOT for CPI computation (modular system): • regular importation of data files • selection and replacement of items • manual correction of item allocation provided by market research institute • implementation of automatic checking operations (e.g. changes in sales, item description, missing price reports, non-coherent user input, etc.) • etc.
Allocation of items to the COICOP (I) • SD from EACH of the three largest retailers contain >100‘000 different food/near-food items • Must be allocated to the COICOP • In-store item numbers are used instead of EAN • In some (rare) cases: Different items behind the same EAN • In-store item numbers are always unambiguous • FSO tested and implemented 2 different methods: • Allocation on aggregated level (pilot software) • Allocation on item level: Purchase of allocation data from market research institute (new software)
1st method: Allocation of items on an aggregated level (II) • Product range structure positions containing multiple / similar articles are linked with a COICOP position • Linking occurs on an aggregation level as high as possible and as low as necessary Advantages: • (New) items are automatically linked at once • Tremendous reduction of (initial) effort
1st method: Allocation of items on an aggregated level (II) • Disadvantages: • Retailers product range structures are often structured according to different criteria→ Multiple allocations and overlaps→ Must be laboriously corrected at item level • Retailers product range structures are subject to permanent (sometimes unannounced) changes→ Allocations to COICOP must be adapted→ Very time consuming→ May lead to time constraints and increased risk of errors
2nd method: Allocation on item level (I) • Each in-store article number is allocated individually • Allocation data is purchased from a market research institute (MRI) every month: • MRIs already allocate all individual items to their own nomenclatures (for their own purposes) • Linking their own very detailed nomenclature to the COICOP is a relatively small effort • New opportunity for the MRIs to gain additional benefit • Allocation data can be purchased relatively cheaply • MRIs have plenty of specialist staff while FSO doesn‘t have the necessary personnel resources for this
2nd method: Allocation on item level (II) • Advantages: • High quality (MRI specialists) + rather inexpensive • Frequent changes in retailers product range structures have no longer any effect on existing links: Time constrains and errors can be avoided / reduced • Considerable time and cost savings: FSO price collectors can focus on their core competence (price collection). • Eases the administrative burden on retailers even more (questions usually not necessary) • Guarantees flexibility: System can be easily extended on other retailers and groups of commodities
2nd method: Allocation on item level (III) • Disadvantages: • Some regular controls of allocation data necessary • (Limited)dependence on market research institute, but price collection is NOT put at risk in case of missing allocation data delivery because • price collection can continue for a while with items already allocated (>99.5%) • Item sample can still be updated based on sales volume • new items (<0.5%) are never selected for price collection during the first three months anyway
Item sampling and replacement (I) • Basically the same methods as with traditional price collection • Same sample size • BUT: Best selling articles can now exactly be pinpointed • For each survey position the items with the highest sales volumes are selected • Relevant for sampling: Position of an item in the ranking list of the items with the highest sales volumes within the same survey position
Item sampling and replacement (II) • Sales volumes of items can change regularly, thus sample must be updated → ensuring representativeness • Items should not be replaced all the time when they are still sold / each time QA necessary→ ensuring continuity Representativeness Continuity
Item sampling and replacement (III) • FSO has empirically attempted to find an optimal compromise between representativeness and continuity • Precise sales validation rules were defined for the replacement of items • Warning messages are generated when an item has to be replaced • Monthly effort is largely limited to dealing with warning messages (around 2-3% of the selected items) • Recent prices of all other articles are automatically surveyed
Item sampling and replacement (IV) • Sales validation rules take into account turnover of preceding months as well! Advantages • More sample continuity • Short-term significant sales changes like strong sales increases due to promotions cannot lead to temporary article replacement (regular sales and quantity bouncing) • Items at very beginning or end of lifecycle usually can’t be selected (thus erratic price movements don’t enter the index)
Item sampling and replacement (V) • Basically same criteria (representativeness and continuity) for item sampling and replacement. • Traditional price collection: • Items are replaced less often and their price is tracked as long as possible • Most similar replacement article is selected (independently of its turnover) In practice, precise knowledge of sales volumes results in some differences though!
Item sampling and replacement (VI) • Scanner data price collection: • Software indicates precisely • when an item has to be replaced • the new item to be selected within the same survey position due to turnover only (independently of its quality characteristics) • Articles vary more in quality, choice of correct QA-method becomes more difficult (same options available) • Fewer direct substitutions, more use of the overlapping link methodand start of new price series • In general, items are replaced more frequently
Other peculiarities of scanner data • Multipacks (e.g. 3 for 2) must be treated separately • Temporarily missing prices • very few • mainly seasonal items • “carry forward” method (max 2x for non seasonal items) • More information on peculiarities like treatment of seasonal products, identification of price reference quantities, chronological sequences etc. see room document
Test price collections (TPCs) (I) • A TPC is done for each retailchain with data from the past • 24 months (first retailer), 12 months (second retailer) • Comparison of results with indices from traditional price collection (same period) • Similar results for the first two retail chains: • Some clear differences for individual position. Can be explained by • different item sample (improved according to definition) • the extended spatial and temporal coverage of price data
Test price collections (TPCs) (II) • Volatility of SD-index is higher esp. at lower aggregation levels (inclusion of all special offers/promotions etc) • For one retail chain: Differences are levelled out at the aggregation and over time • For the other retail chain: Considerable differences remain. Explained especially by improved item sample • in conjunction with unusual price trend of a special, very large product line • in conjunction with modified total number of special price offers over time (are usually carried out on top sellers)
Conclusions (I) • Process is very efficient: • Pre-aggregation of data by retailers is fully automated • Monthly manual effort is mainly limited to replacing items with significant decrease in turnover • Medium-term savings can be achieved after initial outlays • Improved data quality and item sample have significant impact on results • A new IT-system is only necessary for price collection but NOT for index calculation
Conclusions (II) • Buying item allocation data from a market research institute represents a very efficient and not very expensive way of allocating data • Traditional methods useful for food and near food / • For difficult non-food groups like clothing, multimedia etc. additional information on price determining item characteristics would be needed
Mandate • More researchneeded for the implementation of some non-food groups in conjunction with different sampling and calculation methods • FSO might award a contract (mandate) for the development of a (SAS-)Tool from 2010-11 capable of simulating different sampling and calculation methods (e.g. methods used by other statistical agencies, matched-models, superlative indices, GEKS etc) • FSO is looking for suppliers/contractors • FSO is very interested in any existing tools/routines • Collaboration and/or information exchange with other statistical agencies very desirable