1 / 17

21st Century Statistics

21st Century Statistics. The Case against Data Editing Jean-Pierre Kent Ljubljana, 9-11 May 2011. Data editing?. An art of the past …. Why?. Competition Loss of monopoly Quantity Exponential growth Quality Changing criteria. Competition: Google. GPI: Google Price Index

engle
Download Presentation

21st Century Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 21st Century Statistics The Case against Data Editing Jean-Pierre Kent Ljubljana, 9-11 May 2011

  2. Data editing? An art of the past … The Case against Data Editing

  3. Why? • Competition • Loss of monopoly • Quantity • Exponential growth • Quality • Changing criteria The Case against Data Editing

  4. Competition: Google • GPI: Google Price Index • Automatic, real time, internet based, free • Protocol buffers • An alternative for XML • DSPL • An alternative for SDMX • Google Chart tools • Graphic and interactive presentation of data • Google Public Data • An alternative for NSI web sites The Case against Data Editing

  5. Competition: Google “Basically, our goal is to organize the world’s information and to make it universally accessible and useful”. Google The Case against Data Editing

  6. Another alternative price index: Numbeo.com Numbeo • provides to a reader of a website prices for free • allows a person to estimate they own expenses • uses the wisdom of the crowd to get as reliable data as possible • provides a system for systematic research of cost of living and property markets • provides a system for other systematic economical research on huge dataset with worldwide data In real time! And for free! Example The Case against Data Editing

  7. Competition: conclusion Because our competitors offer free and actual statistics, we need to avoid as much as possible time-consuming and labour-intensive activities. For example: data editing. The Case against Data Editing

  8. Quantity:Size of the Web In 2009:500 petabytesIn 2011:1800 exabytesExpected in 2020:50 zettabytes Source: http://www.lesk.com/mlesk/ksg97/ksg.htmlIn 1997… …and today The Case against Data Editing

  9. Quantity • 1800 exabytes • If 99.9% is video, photo, audio, text and nonsense, “only” 0.1% is interesting. • This is 1800 petabytes of relevant data. The Case against Data Editing

  10. Quantity: 1800 petabytes • Can we afford to ignore this mass? • Primary and Register data: A few terabytes • Is this representative of available data? • Can we afford to edit such an amount of data? • Tip: it is growing exponentially! The Case against Data Editing

  11. Quality • Why do we edit data? • Quality • But: • What are the quality criteria? • Who specifies them? • How do we approach the quality of very large data sets? The Case against Data Editing

  12. The Case against Data Editing

  13. Quality • This has already happend to: • Furniture (1950’s) • Watches (1970’s) • Publishing (1990’s) • Many others (1800-2000) • This will also happen to: • Statistics (2010’s) • From Best … • Quality specified by producer • High cost • Long time to market • … to Good Enough • Compromise between quality and cost • User in control of quality and cost The Case against Data Editing

  14. Quality:Impact of data editing How does data editing affect quality criteria? • Cost: negative • Time to market: negative • Authenticity: negative • Variance reliability: negative The Case against Data Editing

  15. Quality of Internet data • These data are produced by relevant processes • These processes depend on the quality of these data • Therefore these data are relevant, representative and good enough… • … and don’t require editing. The Case against Data Editing

  16. Qualityof very large data sets … … does not depend on quality of individual records: • Quality of photo  quality of individual pixels • Quality of music  quality of individual sound bits • Knowledge of crowd behaviour  knowledge of individual behaviour • Why should this be different for statistical data? The Case against Data Editing

  17. Think about it! The Case against Data Editing “The idea that you can take incremental steps in the media business is over. You have to take some big steps and you have to take some risks.” Dave Hunke USA Today

More Related