480 likes | 590 Views
WORKSHOP ON SCANNER DATA Geneva 10 May 2010. Joint presentation by Ragnhild Nygaard (Statistics Norway) and Heymerik van der Grient (Statistics Netherlands). Historical overview – NL Supermarkets. Mid 90s: first contacts with chain(s) 2002: first implementation: 1/2 chain(s)
E N D
WORKSHOP ON SCANNER DATAGeneva10 May 2010 Joint presentation by Ragnhild Nygaard (Statistics Norway) and Heymerik van der Grient (Statistics Netherlands)
Historical overview – NL Supermarkets • Mid 90s: first contacts with chain(s) • 2002: first implementation: 1/2 chain(s) • Yearly Laspeyres (labour intensive) • Construction of yearly basket of items • Manual linking of items to COICOP-groups • Manual replacement of disappearing items • Reduction of ca 10 000 monthly price quotes in field survey
Historical overview – NL, cont Supermarkets • 2010: extension: 6 chains • Monthly chained Jevons (efficient process) • No manual linking of items • No explicit replacements • Extra reduction of ca 5 000 monthly price quotes in field survey
Historical overview – N • 1997: first contact with one chain • Gradually contact with more chains • Implementation in the CPI • only price information of specific representative items • 2002: scanner data from all the chains (no questionnaires - big incentive) • Aug 2005: expanded use for COICOP 01 • price and quantity information for all items in representative outlets
Questions to be answered when dealing with scanner data • How/Where require scanner data? • Which statistical method? • How to link items to COICOP? • How to deal with all kind of particularities in data? • Development of new computer system?
Source of scanner data • Market research companies • Cleaned data • (very) expensive • Two-stage delivery chain (timeliness) • Companies/Chains • Raw data • Cheap (NL/N do not pay) • Direct contact with original supplier
Negotiations with companies • Time consuming process • Negotiations can take up to a year or more including meetings, sending test data, analysing data etc. • Be aware of some company establishing costs e.g. preparing the data extractions • Can company provide what you want/need? • E.g. information to link items to COICOP automatically
Negotiations with companies, cont. • Focus on advantages for companies • Minor costs once established (just a copy of their sales administration) • No questionnaires or monthly visits of price collectors • Other incentives for companies? • Money – not likely • Information • E.g. company price development compared to overall price development
Negotiations with companies, cont. • Establishing good routines with the companies are essential • Strict time schedules • No changes in formats when implemented
Pre - production work • Take your time analyzing the data • Enormous amount of data • N: Over 300 000 price observations each month divided into about 14 000 items • Build shadow system (prototype) • Compare the new price indexes based on scanner data with the old method for a certain period of time before implementation • Discover possible problems in advance • Unexpected situations will arise for sure
Pre - production work • Ideas for analysing the data: • Is same EAN always same item? • Extreme price changes • Specific price development at beginning or end life cycle EAN structurally • Risk of bias! • All kind of dynamics in data • Missing prices • Do properties of data change over time • Etc
Methodology / IT-system • Find methodology that: • Delivers good indexes (e.g. no bias) • Can deal with all particularities in data • Build IT-system that supports the chosen methodology • Learn from experiences other countries using scanner data
Properties of dataConsequences for methodology NL and N • High attrition rate of items
Properties of data, cont.Consequences for methodology NL and N • How to deal with high attrition rate of items • NL : monthly chained index • N : monthly chained index
Properties of data, cont.Consequences for methodology NL and N Sales: low prices combined with enormous increase in quantities sold
Properties of data, cont.Consequences for methodology NL and N • Consequences of sales: • Single observations can have extremely high influence on elementary index • Risk of bias applying monthly chaining and explicit weights
Properties of data, cont.Consequences for methodology NL and N • Bias not just theoretically! • Example for detergents
Properties of data, cont.Consequences for methodology NL and N • How to deal with sales? • NL crude weighting on item level: w=0 or 1 • N Manual checks of price ratios that contribute most to elementary results: “critical observations”
Properties of data, cont.Consequences for methodology NL and N • Implausible price changes • NL price changes (pt/pt-1) of more than a factor 4 are deleted • Changes of +5000% and -99% do actually occur • N price changes (pt/pt-1) of more than a factor 3 are deleted
Properties of data, cont.Consequences for methodology NL and N • Temporarily missing prices
Properties of data, cont.Consequences for methodology NL and N • How to deal with temporarily missing prices: • NL: imputation of missing prices • N : no adjustments, but imputing prices is considered for the near future
Properties of data, cont.Consequences for methodology NL and N • Quality differences • Items with same EAN are considered to be identical • Items with different EAN are treated as different items (no matching) • How to deal with quality differences: • NL Only adjustment in exceptional cases: manual interference • N No adjustment
Actual method - NL • Data received: • For each item each week: • EAN • Short description • (Chain specific) product group • Used to link items to COICOP automatically • Expenditures • Quantities sold
Actual method – NL, cont. • Price of item: • Unit value based on first three weeks of month • Unweighted price index elementary level: • Monthly chained Jevons on selection of items • Weighted price index higher aggregates: • Yearly chained Laspeyres • Weights based on scanner data of all 52 weeks of previous year
Actual method – NL, cont. • Item selection at elementary level • Items with low expenditures : w=0 • Other items : w=1 • Threshold of low (average) expenditure share: • Example: threshold =1% for χ=2 and N=50
Actual method – NL, cont. • Determination of threshold value • Simulations lead to: • Optimal value: χ=1.25 • Ca 50% of items is excluded (on average) • Elementary index based on 80 à 85% of total expenditures • Elementary level (chain dependent) comparable with COICOP6
Actual method – NL, cont. • Refinements: • Extreme price changes are excluded (factor 4) • Missing prices are imputed • Dump prices at end lifecycle item are excluded (see paper)
Actual method – NL.What advantages were achieved? • Indexes are of higher quality • Compared with old method scanner data • Compared with field survey • Response burden for companies is lower • No price collection in the shops • Efficiency gains? • Yes: more or less automatic production process • Investment costs (IT-system) were (very) high
Illustrations • Price indexes based on five supermarkets
Illustrations • Price indexes based on five supermarkets
Actual method - N • Data received: • For each item in the midweek of the month: • EAN/PLU • Short description • (Chain specific) product group • Calculated average price • Quantity sold • Expenditure
Actual method – N, cont. • Sample of representative outlets • Stratified by chain and concept • Matching EAN/PLU with COICOP6 • Weighted Jevons price index on elementary level with expenditures shares of current and base period; • Monthly chained Törnqvist index • Scanner data weights between the COICOP6 groups
Actual method – N, cont. • Higher aggregates: • Yearly chained Laspeyres • Weights from HES (NR as of 2011) • Exclude strongly seasonal items only available for a certain period of the year • Manual control and possibly exclusion of extreme contributions to elementary results
Actual method – NWhat advantages were achieved? • Indexes of higher quality? • New methodology led to reduction of e.g sampling and measurement errors, but also to new biases • Much more data – more detailed price indexes • Considering both prices and quantities • Many indexes have improved, others have not • Low response burden for companies • No questionnaires • Efficiency gains? • Automatic production process which requires some manual interference • Resources demanded not much higher than before • High investment costs (IT-system)
New methodology • Newly developed index (Ivancic, Diewert, Fox) • Rolling year GEKS price index • Source: • GEKS-algorithm of purchasing power parities (International Comparison Programme) • GEKS index transitive by construction • chained index equals direct index • no chain drift • A geometric mean of direct superlative price indexes
New methodology, cont. bilateral indexes (Törnqvist or Fisher) between entities j and l (l=1..M) and between entities k and l, respectively Purchasing power parities : entity is country Scanner data : entity is month
New methodology, cont. • Expanding time period leads to revising all previous GEKS indexes • Solution: rolling version (chaining) etc
RYGEKS and NL • RYGEKS specifically developed for Statistics Netherlands as remedy for not-weighting at elementary level • Not (yet) applied in practice • Used as benchmark • Finding optimal value threshold • Current method (NL) resembles RYGEKS quite well (on average) • No bias found
RYGEKS and NL, cont. • Plans for near future: • Shadow system based on RYGEKS indexes • Continuous benchmark for current method • Implementation when RYGEKS is widely accepted? • More (international) analysis needed
RYGEKS and N • RYGEKS indexes tested on Norwegian scanner data on different levels; • EAN, elementary and aggregated COICOP levels • For COICOP 01 compared a monthly chained Törnqvist index with a monthly chained RYGEKS index • The results indicate some bias in the Törnqvist index
RYGEKS and N, cont. • Small deviations for many COICOP aggregates • Milk, Cheese and eggs, Oils and fats, Vegetables, Fish
RYGEKS and N, cont. • While others show more deviations • Meat, Sugar, jam and chocolate
RYGEKS and N, cont. • Causing bias; • Missing prices • Seasonal items (not excluded) • Price and quantity oscillating over time • Shadow system for calculating RYGEKS indexes on monthly basis established • Too early to be implemented
Scanner data in other branches? • NL: • Expanding to other branches desirable • Data available (e.g. durables) • Problem of quality changes • Analysis needed • N: • Continuously working to expand scanner data • Increasing pressure from chains and outlets • Data available for pharmaceutical products, wine and spirits (state monopoly) and petrol • Mostly price information implemented • Have tried to cover clothing, but matched item model unsuccessful