1 / 26

InFuse: Data Feeds for the UK 2001 and 2011 Censuses and Beyond

InFuse: Data Feeds for the UK 2001 and 2011 Censuses and Beyond. Justin Hayes Census Dissemination Unit (CDU) Mimas The University of Manchester. Where are we going?. CDU background Recent work on CAIRD Project Current work on InFuse Project Forthcoming work in collaboration with ONS

knut
Download Presentation

InFuse: Data Feeds for the UK 2001 and 2011 Censuses and Beyond

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. InFuse: Data Feeds for the UK 2001 and 2011 Censuses and Beyond Justin Hayes Census Dissemination Unit (CDU) Mimas The University of Manchester

  2. Where are we going? • CDU background • Recent work on CAIRD Project • Current work on InFuse Project • Forthcoming work in collaboration with ONS • Future ideas

  3. Data Feed? Flexible Interoperable Structure Consolidates information relating to a dataset and integrates it by enforcing a structure which it describes using open standards to allow comprehensive and comparable information to be exposed and transferredonline in ways that make it understandable, interoperable, flexible, and, most importantly, usable. Describe Comparable Consolidate Understandable Usable Integrate Open Standards Comprehensive Online Expose Transferable

  4. Dimensions, Codelists and Codes General Health

  5. CDU Background • Dissemination of aggregate outputs from recent UK censuses to UK academics • Small team funded by ESRC • Service, research and engagement roles • Two decades of pioneering work • Casweb • Retrieval and reprocessing of UK 1971 Census • GeoConvert

  6. Barriers to Effective Dissemination • Large and complex dataset • Lack of global structures • ‘Hand crafted’ tables as primary instrument • Inconsistent structures • ‘Age’ particularly problematic example • No comprehensive description • Scattered information • Poor connection of data and metadata • Approximately 300 tables with many inconsistencies • Metadata in multiple locations with varying access

  7. Age Bands 99 age bandings 76 unique to a single table

  8. 223 Age Codes

  9. Standard Table 13 Framework

  10. Standard Table 13 Data

  11. Text String Cell Descriptions S013:37 (AGE OF HRP 24 OR UNDER - Rented from council : ALL HOUSEHOLDS ) S013:38 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Total ) S013:39 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Employee ) S013:40 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Self-employed ) S013:41 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Unemployed ) S013:42 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Full-time students ) S013:43 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Total ) S013:44 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Retired ) S013:45 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Student ) S013:46 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Looking after home/family ) S013:47 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Permanently sick or disabled ) S013:48 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Other )

  12. Effects of Barriers • Incomplete and unconnected information • Poor exploration • Potential for misinterpretation and misuse • Not interoperable • Applications must provide specific metadata • Frustrating for users and service providers

  13. Challenges to Improve Services • Consolidate all related information • Extract and apply consistent structures • Describe to make understandable and transferable • Publish data via web service and API • Build our own user applications • Use open standards wherever possible • Take advantage of external development • Encourage ONS to do the same for 2011 • Find money to do all this!

  14. CAIRD Project • Additional funding from ESRC • One researcher for one year from June 2008 • Feasibility project • Dimensionalised sample of 40 tables • Conceptual structure based on SDMX • SOAP-based web service and API • CAIRD application • Codelist-based data selector • CSV and SDMX outputs

  15. CAIRD Geography Selector

  16. CAIRD Data Selector

  17. CAIRD SDMX output

  18. InFuse • Mimas strategic funding to take results of CAIRD Project into service • One researcher from August 2009 to present • Initial application launch September 2010 • 2001 Census for England and Wales • Tangible outputs just commencing

  19. InFuse User Requirements • Initial phase of work • Workshop for expert academic census users • Questionnaire • Functional and requirements specifications • IASSIST 2010

  20. Structuring the 2001 Census • Restructuring and parsing of output tables • Information from Census Definitions Volume • Development of master set of codelists • Creation of geography codelists • De-universification • Encoding of hierarchies • Incorporation of core set of metadata • Multiple value counts problem

  21. InFuse Features • Theme based exploration • Handling sparsitythrough guided exploration • Text search • Thesaurus and gazetteer • Move to RESTful web service with private API • URI schema for RDF development • Encoding of, and operation on hierarchies • Modular, open source design for re-use • Integration of digital boundary data • Initial text output

  22. Initial InFuse Outputs • InFuse URI schema • http://130.88.120.139/InFuseWS/InFuseWS.svc/data/contenttype/datasets?format=html • InFuse text search with thesaurus • Search targets: codelists, codes, glossary, areas, areatypes • http://130.88.120.139/InFuseWS/InFuseWS.svc/data/contenttype/datasets/dsid/1/glossary/search?keywords=race

  23. CDU/ESRC/ONS Collaboration for 2011 • Data feed influence on ONS 2001 plans • Data Feed Network • Census Web Services Working Group (CWSWG) • ONS commitment to disseminate via API • Collaborative funding • Two researchers for one year! • Test datasets for ONS API • Work on 2001 to 2011 comparability • Application development for testing of ONS API

  24. In the InFuse Pipeline • More datasets • More metadata • Work on definitional and geographical comparability • Further application development • SDMX and RDF interaction • Release of a public API • GeoConvert module • Linkage of unit and aggregate data

  25. Summary • It’s possible to retrospectively structure and disseminate complex datasets via data feeds, but much easier to do at source. • Potential for improved and expanded secondary usability of datasets will act as a stimulus for the development and use of open standards methods and structures in dataset creation.

  26. Contact • justin.hayes@manchester.ac.uk • census@mimas.ac.uk • 0044 161 275 6109

More Related