100 likes | 204 Views
Big Data in the National Accounts. Experience in the United States. Brent Moulton. Advisory Expert Group on National Accounts Washington, DC 9 September 2014. What are big data?.
E N D
Big Data in the National Accounts Experience in the United States Brent Moulton Advisory Expert Group on National Accounts Washington, DC 9 September 2014
What are big data? • Wikipedia: “Any collection of data sets so large and complex that it becomes difficult to process using… traditional data processing applications.” • IBM: “Every day we create 2.5 quintillion bytes of data… This data comes from everywhere… This is big data.” • Forbes: “12 big data definitions: what’s yours?” • # 11 – “The belief that the more data you have, the more insights and answers will arise automatically from the pool” • # 12 – “A new attitude… that combining data from multiple sources could lead to better decisions.”
Big data and official statistics • Statistical agencies as producers of big data • Consistency in format and presentation • Catalogued in common, machine-readable format • Accessible in bulk • Desirable to make government data available on a single platform • Big data as source data for national accounts • Administrative data, especially micro-data • Data from private sources • Web scraping
Concerns about using big data • Do the concepts match those needed for national accounts? • How representative are the data? • Selection biases • Is it possible to fill the gaps in coverage? • Do the data provide consistent time series and classifications? • How timely are the data? • How cost effective?
Defined-benefit pension funds • For the SNA’s new treatment of defined-benefit pensions, BEA found it useful to work with administrative micro-data filed by pension funds • “Form 5500” data from Pension Benefit Guaranty Corporation • ~ 45,000 records per year covering 98% of private pension funds • BEA had to edit data to remove data errors and anomalies
Private source data for early estimates • For “advance” GDP estimate (release about 30 days after the end of the quarter), official monthly/quarterly indicators are not always available • Examples of private source data used by BEA: • Ward’s/JD Powers/Polk (auto sales/price/registrations) • American Petroleum Institute (oil drilling) • Air Transport Association of America (airlines) • Variety magazine (motion picture admissions) • Smith Travel Research (hotels and motels) • Investment Company Institute (mutual fund sales)
Health care satellite account • Schultze Commission (At What Price? 2002) recommended that health care price indexes should be based on cost of treating a specific diagnosis • BEA is preparing a health care satellite care (http://www.bea.gov/national/health_care_satellite_account.htm) • One approach uses insurance claims data for several million insured individuals • Claims grouped in disease episodes • Allows comparison of change in cost for treating particular diseases
Local area tracking system • Used by BEA’s regional accounts staff for independent data on regional economies • Used to vet official statistics before publishing • Types of data • Employment data: largest employers, principal industries, recent layoffs • Natural events affecting the economy • Local real estate and financial trends • Automated using web scraping methods • Identifying key word searches • Archiving relevant articles
BEA research on depreciation • Identifying depreciation in the presence of obsolescence is a long-standing issue • BEA research on motor vehicle depreciation proposes to address this problem using data on “build dates,” which can differ from model years • Data scraping – VIN-level data from decodethis.com combined with auction data from NADA and data from other auto websites • Goal is improved estimates of depreciation
Conclusions • Big data will become increasingly important • Priority to improving data quality, filling gaps, and keeping up with changing economy • Big data especially useful for research projects • Big data may allow for more timely or higher frequency estimates • Attention must continue to be paid to traditional data quality issues