170 likes | 259 Views
On the use of internet robots for official statistics. Olav ten Bosch MSIS, Dublin, 14-16 April 2014. Overview. Why internet as a data source (IAD)? Internet robots, how do they work ? Applications: Airline tickets Housing market Clothing “Robot assisted data collection”
E N D
On the use of internet robots for official statistics Olav ten Bosch MSIS, Dublin, 14-16 April 2014
Overview • Why internet as a data source (IAD)? • Internet robots, how do theywork? • Applications: • Airline tickets • Housing market • Clothing • “Robot assisted data collection” • Conclusion
Why IAD? (1) Internet sources Faster, better, more efficient New indicators Less!!! Administrative sources Tax, social security services Municipalities/ Provinces Supermarkets Surveys
Why IAD? (2) Internet sources Which content is original, reliable, stable, representative and accessible? Internet prices for CPI ? Real estate sites for housing statistics ? Internet vacancies for job statistics ? Social media sentiment for consumer confidence ? Trade in second-hand goods as economic indicators ? Travel activity for tourism statistics ?
Robots / crawlers / bots / spiders / scrapers: how do theywork? (1) Internet Requests Graphical markup Website Commands code, images, style, data, etc. Browser You
Robots / crawlers / bots / spiders / scrapers: how do theywork? (2) Navigation Internet Requests Website code, images, style, data, etc. Robot/ spider/ crawler You Data
Robots / crawlers / bots / spiders / scrapers: how do theywork? (3) Generic software for: - site navigation - product details - monitoring Navigation Agile Internet Requests Website code, images, style, data, etc. Robot/ spider/ crawler Monitor actively Data Data Data Data Data
Airline tickets (1)Robot collection versus manual collection
Housing market (2)Dynamics of the ‘database behind’ becomesvisible
Clothing (2): 2 sites: veryvolatile data • Challenges: • from volatile data to stable statistics • how to classify multiple less structured • data sources Seasonal pattern
Robot-assisted data collection (1) • Use case: few priceobservations on many sites • Example: price of a cinema ticket • “Robot tool” toautomatically check ifprices are changed
Conclusion • Using internet as a datasource we can measure statistical phenomena in a completely different way • It is powerful to combine fast internet data with reliable (but slower) administrative data • We should redesign statistics with the possibilities of internet data in mind Challenges: • Legal framework • The internet changes continuously: howto turn volatile data sources intoreliablestatistics? • We needadvancedstatisticalmethods, processesand IT