1 / 11

Two Paradigms for Official Statistics Production

Two Paradigms for Official Statistics Production. Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden. Preliminaries. The talk concerns data and knowledge about external world – not data and knowledge about producing statistics (but might have consequences for the latter)

lamond
Download Presentation

Two Paradigms for Official Statistics Production

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden

  2. Preliminaries • The talk concerns data and knowledge about external world – not data and knowledge about producing statistics (but might have consequences for the latter) • Inspired by the different discussions on ongoing developments and initiatives within (official) statistics • May have certain relevance for editing • Naturally, the views presented herein are those of the authors, not necessarily reflecting policies of Statistics Sweden

  3. Preliminaries (cont’d) • Transition from (many) Stovepipes to (few) Integrated System(s) • Among intended goals • better integration of administrative data and survey data, • better/faster response to new or changing user needs • How an integrated system should look like so as to satisfy these requirements • answer sought in the field of knowledge systems/cognitive systems

  4. Agenda • Preliminaries • On some distinctions and results regarding knowledge/cognitive systems • Consequences for representing data in Integrated systems for statistics production • Further considerations for statistics methodology, including some thoughts regarding editing

  5. Knowledge/Cognitive Systems • Computational • symbolic • first-order predicate logic • other formal logic • etc • subsymbolic • artificial neural networks (ANNs) • etc • Other (noncomputational) • embodied cognition • situated cognition • socially distributed cognition • etc Good for restricted domains with clear rules (e.g. chess), less good for open-world problems

  6. Database developments • Relational Model • RDBMS (Relational Database Management System) • implements first-order predicate logic • database schema: theory in predicate calculus • NoSQL • schema-less (theory-less) • examples • Google‘s BigTable • solutions underlying some functions on Amazon, Twitter, and Facebook • Perhaps related: Semantic Web • how to structure documents into a “web of data” • “a web of data that can be processed directly and indirectly by machines” • uses Resource Description Framework (rather than RDBMS)

  7. Consequences • likely requires expert assistance to users in search and requirements specification • likely empowers users to themselves explore available data and consider merits of requiring new data • Paradigm I: Stovepipe + RDBMS • ‘manual’ management of a fairly restricted domain • single-purpose use likely requires expert assistance to users in search and requirements specification • Paradigm II: Integrated system + noSQL • automatic building of world knowledge pertaining to the domain • multi-purpose use likely empowers users to themselves explore available data and consider merits of requiring new data

  8. Sampling theory considerations • In the context of Paradigm II: • use of weights • what should they then reflect: • inclusion probabilities (if known)? • nonresponse information (including an assumed model)? • auxiliary information pertaining to specific variables to be estimated? • use of models • memorylessness vs. Bayesian statistics

  9. Editing • Editing for a purpose vs. editing “without a purpose” • adherence to general specifications (‘concept validity’) • self-learning (unsupervised) tools from computer science/ANN • model congruence (especially building automatic models using methods from the KDD (Knowledge Discovery and Data Mining) field • more?

  10. Conclusions • The distinction likely not as clear-cut as presented here, however the trend discernible: • transition from “manual” to automatic processing • potential increased need to use models • In building representations of “world knowledge”, in addition to RDBMS, pay attention to developments in NoSQL, Big Data, and similar • Perhaps strengthen work on • general-purpose data editing • automated data editing • model use • ... (as already advanced in several contributions to the workshop)

  11. Thank you boris.lorenc@scb.se

More Related