380 likes | 574 Views
Outline of presentation. Introduction: relevant background on interaction data and CIDER and WICIDAudit of Interaction Data Sources: a brief overview of the variety of interaction data sources available in the UK What were the recommendations of the audit? How do we propose to take things forward to create an enhanced UK spatial interaction data service?The new INTERACTION system: overview of the issues and challenges involvedThe new data: an overview of the individual characteristics of e35344
E N D
1. Towards an Enhanced UK Spatial Interaction Data Service Adam Dennett, Oliver Duke-Williams and John Stillwell
School of Geography, University of Leeds
Presentation for British Society for Population Studies, University of St Andrews, 11-13 September 2007
2. Outline of presentation Introduction: relevant background on interaction data and CIDER and WICID
Audit of Interaction Data Sources: a brief overview of the variety of interaction data sources available in the UK
What were the recommendations of the audit?
How do we propose to take things forward to create an enhanced UK spatial interaction data service?
The new INTERACTION system: overview of the issues and challenges involved
The new data: an overview of the individual characteristics of each of the new proposed datasets
3. Introduction – CIDER CIDER: the Centre for Interaction Data Estimation and Research
Based now, principally, at the University of Leeds though software runs at Manchester
Data Support Unit: part of the ESRC-funded UK Census Programme
4. Explain Interaction data to do with FLOWS of migrants and commutersExplain Interaction data to do with FLOWS of migrants and commuters
5. Introduction - CIDER
6. Introduction – CIDER Data Sets and Geographies 2001 Census: Special Workplace Statistics (SWS) (Levels 1, 2 & 3)
2001 Census: Special Travel Statistics (STS) (Scotland Levels 1,2 & 3 and Level 2 Scottish postal sectors)
2001 Census: Special Migration Statistics (SMS) (Levels 1,2 & 3)
Also comparable datasets from 1991 and 1981
As well as the standard District, Ward and OA geographies available, different aggregations of these basic units, as well as various bespoke geographies are available for different data years Other Geogs include 100 Zones (FHSAs), 1991 counties and countries, LLSOAs MLSOAs, foreign originsOther Geogs include 100 Zones (FHSAs), 1991 counties and countries, LLSOAs MLSOAs, foreign origins
7. Introduction - WICID Can select information either by geography or data first.
Output is in a variety of formats including CSV, html etc…Can select information either by geography or data first.
Output is in a variety of formats including CSV, html etc…
8. Introduction – CIDER’s Ongoing Objectives CIDER’s objectives of relevance to this presentation:
To gather/estimate further UK census-based data sets and include them in the system
To expand the WICID system to incorporate a range of UK interaction data sets from outside of the census
To undertake research based on the current and future interaction data sets held within the software system
9. Interaction Datasets in the UK: An Audit Purpose of the Audit:
Before adding new datasets to WICID, we needed to know what was out there!
To identify and evaluate sources of interaction data in the UK that might complement the current census datasets held in WICID
To make recommendations relating to the inclusion of the most useful datasets in a new, expanded version of WICID called INTERACTION Whilst detailed and comprehensive, UK census datasets have the obvious limitation of being decennial – other datasets, whilst maybe lacking data coverage of census are temporally more regular.
Datasets collated on a more frequent basis provide an opportunity for more complete temporal coverage.
Whilst other datasets are in existence to study internal migration, their ease of access to researchers is limited. WICID allows for a flexible, query building approach which facilitates easy access to the information people want through selection of Origins, Destinations and a range of disaggregated variables – age, sex etc… Inclusion of additional datasets in wicid will add real value to the service.Whilst detailed and comprehensive, UK census datasets have the obvious limitation of being decennial – other datasets, whilst maybe lacking data coverage of census are temporally more regular.
Datasets collated on a more frequent basis provide an opportunity for more complete temporal coverage.
Whilst other datasets are in existence to study internal migration, their ease of access to researchers is limited. WICID allows for a flexible, query building approach which facilitates easy access to the information people want through selection of Origins, Destinations and a range of disaggregated variables – age, sex etc… Inclusion of additional datasets in wicid will add real value to the service.
10. Interaction Datasets in the UK: An Audit
11. Census data sources of interaction data
12. Major administrative sources of interaction data
13. Important surveys containing interaction data
14. Recommendations coming out of the Audit… Additional data should be included in the new system from the following four sources:
2001 Census: the large and more complex matrices of migration and commuting flows commissioned from ONS that have national coverage at district and sub-district spatial scales
NHSCR: annual flows, from 1975 to 1998, of NHSCR patient re-registration movements between 100 FHSA-based zones, disaggregated by age and sex; and
annual flows, from 1998/99 onwards, of NHS patients movements between HAs, disaggregated by age and sex Generally, with the census there are already online query and extraction systems in place, so CIDER does not wish to replicate these existing census services. In the case of some of the large commissioned tables – extensive data and spatial coverage means that it will be useful to add these datasets to wicid.
CIDER already holds NHSCR data for 1975-1998 for set of 100 zones based on FHSA geography. As a relatively reliable source of year-on-year migration data, it would be very useful to include this data in WICID. ONS have expressed their willingness to release post 1998 data for CIDER to use, however due to new HA geography, work will need to be done to create a continuous time series from 1975.
Student migrations to HE institutions are important both in terms of their magnitude and impact
Generally, with the census there are already online query and extraction systems in place, so CIDER does not wish to replicate these existing census services. In the case of some of the large commissioned tables – extensive data and spatial coverage means that it will be useful to add these datasets to wicid.
CIDER already holds NHSCR data for 1975-1998 for set of 100 zones based on FHSA geography. As a relatively reliable source of year-on-year migration data, it would be very useful to include this data in WICID. ONS have expressed their willingness to release post 1998 data for CIDER to use, however due to new HA geography, work will need to be done to create a continuous time series from 1975.
Student migrations to HE institutions are important both in terms of their magnitude and impact
15. Recommendations coming out of the Audit… HESA: annual flows, from 2001 onwards, of student movements between MLSOA of parental domicile and HEI, disaggregated by various characteristics
NHS IC: annual flows, from 2001 onwards, of hospital patients from LLSOA or MLSOA of residence to hospital, disaggregated by various attributes
16. Implications for CIDER CIDER is currently in negotiation with the custodians of these targeted data sets to see if incorporation of the data into a an extended version of WICID is possible.
All current indications are positive, but due to the differing availability and cost of particular data sets, it is likely that the acquisition and incorporation of some data will happen before others.
Securing additional funding via the Census Development Programme should allow for the purchase of data and trial of a new improved INTERACTION data system which incorporates these new data sources.
17. Towards an Enhanced Spatial Interaction Data Service…
Overview of the issues and challenges involved with adding new non-census datasets to the new INTERACTION system.
The new data: A more detailed look at the individual characteristics of each of the new proposed datasets.
18. WICID – The current system
19. WICID - Inbuilt flexibility System originally designed to handle a variety of primary (migration) data
Metadata is key as it describes the primary data held in the database. The system relies on this metadata to recognise the range of primary data stored
The system has very few ‘hardcoded’ assumptions about the data – it is all looked up whenever a data page on the user’s browser is produced
Data need only have a single origin and destination identifier, with a set of fields (generally a set of counts disaggregating the flow)
20. WICID – The metadata
21. WICID – sample of table in SQL database
22. WICID – finalising the metadata
23. WICID – The finished product.
24. From WICID to INTERACTION Flexible nature of the current WICID system should allow for the addition of non-census datasets as long as the data is prepared in the required pair-wise origin, destination, variable format
Main challenges:
Re-designing the interface to handle time-series data. Current data are discrete, cross-sectional data
Some of the datasets (HES for example) present issues related to geographies: Currently, HES destination is a specific point, rather than an area
Metadata redesign to clearly identify different datasets and characteristics for users
Incorporation of ‘on-the-fly’ disclosure control routines for datasets like HESA
25. INTERACTION – Example issues
26. INTERACTION – Example issues Output complexities will need to be solved, with extra dimensions to the data output
e.g. Current: origin/destination by age by sex
Could be: origin/destination by age by sex by year
27. INTERACTION – Example issues Currently, census data supplied to us has already been subjected to statistical disclosure control methods, such that small counts are suppressed before the data is put onto the system - this can affect the accuracy of query results
Where some new datasets will be supplied in primary unit form, this offers us the opportunity to only apply statistical disclosure control where it is necessary, thus increasing data accuracy for the end user
Different techniques will need to be trialled and evaluated before data is made widely available
28. The New Data Three new non-census data sets would be included in INTERACTION:
National Health Service Central Register (NHSCR) data from 1975 to present
Hospital Episode Statistics (HES) data from 2001 to present
Higher Education Statistics Agency (HESA) student data from 2001 to present
29. NHSCR Data NHSCR data will be available as a time series for a consistent set of 100 Zones based on the FHSA geography from 1975 to 1998
Post-1998 data will be available for Health Authority areas in England and wales and equivalent areas in Scotland and Northern Ireland
Variables will be restricted to broad age and sex categories
31. HES data We would be aiming to include HES data from 2001 until the present
Data contains information on all in-patient episodes relating to Hospitals in England
Origins are as detailed as Ward or SOA. Destinations are available down to Postcode Unit level
The ‘journey to hospital’ data can be disaggregated by a huge variety of variables, including:
32. HES data Age (at end and start of hospital episode)
Sex
Ethnicity
Duration of episode
Type of episode (related to treatment given)
Diagnosis category (International Classification of Diseases and related health problems [ICD-10] classification) – contains information on every known illness/disease/injury
Separate classifications for maternity and mental health episodes
Type of operation (if applicable)
34. HES data – research opportunities Hospital Episode Statistics provide a unique opportunity to study hospital catchment areas in relation to specific treatments and enable measurements of ‘market penetration’ – something becoming more relevant under the new NHS Patient Choice directive which allows patients more choice over where they are treated
Spatial interaction modelling will enable analyses of the frictional effect of distance on the ‘commute’ to hospital, and the testing of ‘what if’ scenarios in relation to the opening and closing of hospitals
Optimum locations for new hospitals or treatment centres in relation to demand could be explored through location-allocation modelling
35. HESA data We would be aiming to include HESA data from 2001 until the present
Data contains information on the home address and destination of higher or further education institution
Origins could be as detailed as MLSOA with destinations only as accurate as the location of the HE institution attending – no way to ascertain exactly where student is living
Student migrations can be disaggregated by:
36. HESA data Age group (5 years)
Disability (disabled/not known to be disabled/not known)
Ethnicity (white/non-white/unknown)
Domicile (middle layer Super Output Area)
Postcode of HEI headquarters
Level of study (postgraduate, first degree, other undergraduate)
Subject area
Term-time accommodation
Major source of tuition fees
Mode of study (full-time/part-time)
Gender
37. HESA data – research opportunities Students are the section of the population most actively involved in internal migration in Britain
Increasing numbers of students are entering into higher education, with large numbers of students becoming features of many of Britain’s major urban centres
Students have significant social, cultural, economic and environmental impacts on the areas they live with issues such as ‘studentification’ becoming active topics of political debate
Times series and cross-sectional analysis of student migration data in Britain should allow for greater understanding and prediction of student in-migration impacts
38. Conclusions: An extensive audit of interaction data in the UK led to CIDER identifying a number of key sources that could be incorporated into an updated version of the WICID system
New data sources would compliment existing census-based interaction datasets and would move CIDER towards providing a more complete interaction data service
An number of technical challenges will need to be overcome as we move from WICID to INTERACTION
Easy access to new interaction data sources will provide unique opportunities for substantive research to be carried out in relation to internal migration in the UK
39. Thank you Adam Dennett,
Centre for Interaction Data Estimation and Research,
School of Geography,
University of Leeds
a.r.dennett@leeds.ac.uk
http://www.geog.leeds.ac.uk/people/a.dennett/
For the full audit: http://www.geog.leeds.ac.uk/wpapers/index.html