180 likes | 356 Views
Big Data .vs. Official Statistics . Directors General of the National Statistical Institutes Meeting 25~27 September 2013/Hague, Netherlands. Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea. Contents. Technology Assessment (TA) in Korea
E N D
Big Data .vs. Official Statistics Directors General of the National Statistical Institutes Meeting 25~27 September 2013/Hague, Netherlands Yugyung Kang Director, Statistical Information Portal Division Statistics Korea
Contents • Technology Assessment (TA) in Korea • Big Data Use in Private Sector • Market Analysis • Suicide Warning System • On-going Projects by KOSTAT • Pilot Project for Mining and Manufacture Survey • E-household Account System • Pilot Project for Price Statistics • Future Challenges 1
Technology Assessment (1) …Conducted by MSIP of Korea in 2012, under the Article 14 of the Framework Act on Science and Technology • What is big data? • Data with 3Vs characteristics + Data Management Technology • * Gartner’s 3Vs : Volume, Variety and Velocity Unstructured Data Structured Data Low speed (hours to weeks) GB/TB Messages Video Music PB EB ZB High speed (mins. to seconds) Customer Data Sale Data Stock Data Finance Data BBS SNS GPS ……. 2
Technology Assessment (2) • Expected Impact 3
Policy Recommendations Localize Core Technologies related to big data through gov’t-led R&D Establish Legal and Institutional Basis for standardization of managing, sharing and trading big data Foster pool of Big Data Analysts and Experts through interdisciplinary undergraduate and graduate programs Take a Step-By-Step Approach by Setting Priorities in the sectors where benefits to the public will be visible. Make Strategies to Protect Privacy Technology Assessment (3) 4
Case 1 : Market Analysis by Big Data Use in Private Sector Which Business would you like to open? X 5
Big Data Use in Private Sector Case 1 : Market Analysis by Real Estate 411 Business Cycle Real Estate … Sales Information Consumer Type Korean Statistical Information Service Floating Population 6
Big Data Use in Private Sector Case 2 : Suicide Warning System Why not Suicide forecast? Weather Forecast • social factors • weather factors • Werther Effect • personal emotion OECD (2012), OECD Health Statistics 7
Big Data Use in Private Sector Case 2 : Suicide Warning System • Training Set (2008-2009) & Test Set (2010) • Total number of suicide incidents • Economic and weather data • CPI, unemployment rate, KOSPI(Korean Composite Stock Price Index), daylight hours and temperature • 150 million posts from about 5 million blogs on NAVER(incl. SNS posts) • Var1 (# of posts including “suicide”), • Var2 (# of posts including “dysphoria”, “be tired”, “be painful”, or “be exhausted”) • Model • Dependent Variable : No. of suicide in a given period(3 days) • Independent Variables • CPI, unemployment rate, KOSPI, daylight hours, temperature • Two variables obtained from the Posts • Celebrity suicide (control variable) • No. of suicide from the previous period 8
What should NSOs do? scientifically collected data .vs. huge amount of data Big Data Sample Surveys Challenge! • Quantity beats quality • Lack of representativeness of target population • MORE TIMELY • Data already there Established theoretical basis Representativeness of target population Relatively slow Expensive data collection 9
KOSTAT tried… Pilot Project Seminars • December 2012~April 2013 • A pilot project on the use of big data in the process of editing existing national statistics • Using media data for examining outliers when producing the Index of Industrial Production(IIP) October 2012~March 2013 Organizes seminars once or twice a month inviting outside big data experts Aims to raise awareness of big data and its impact on producing official statistics 10
KOSTAT is doing… 1. E-Diary System(household Account System) • Currently about 48.5% of sample household adopted the e-Diary system • Respondentscan import their expenditure information through online transactions from the banks, credit card companies and major retail stores. using big data for the convenience of respondents 11
KOSTAT is doing… 2. Pilot Project of Price Index Please select specific domains(or items) that can clearly show difference between big data and existing statistics i.e. TV or electronic products Prof. Roberto Rigobon KOSTAT is currently preparing for a pilot project on compiling price index using big data for a specific manufacturing product. 12
Future Challenges Can we ignore Big data just because of its representativeness issue in spite of its strengths like timeliness? Can KOSTAT disallow over 380 statistical agencies to produce official statistics with big data? Maybe Not! Shall make use of big data in producing statistics at some point in the future as it was the case with transition to administrative data from survey data. Need to identify the limitations of big data through pilot projects and learn techniques and know how to refine big data based statistics for official statistics. 13