380 likes | 515 Views
2001 Census: the emergence of a new geographical framework. David Martin Department of Geography University of Southampton. Overview. Background issues Postcode building blocks Output areas by automated zone design Zone design experiments Illustrative results Demonstrator project
E N D
2001 Census: the emergence of a new geographical framework David Martin Department of Geography University of Southampton
Overview • Background issues • Postcode building blocks • Output areas by automated zone design • Zone design experiments • Illustrative results • Demonstrator project • Application to SAM specification • A new project… • Conclusions
Background issues • 1991: EDs designed for data collection, but used for both data collection and output • 2001: separation of collection and output geographies - purpose-specific geographies • New output areas built from synthetic unit postcode polygons • Application of automated zone design (after Openshaw, 1977)
Postcode building blocks • Approx 1.7m unit postcodes • Aggregation of these small building blocks into output areas (OAs) ensures best census-postal geography match • No pre-existing polygons, (exc. Scotland) • NISRA to digitize, ONS to generate • OS to create separate new product!!
Generation of postcode polygons (1) • Thiessen polygons around individual ADDRESS-POINTS, clipped to statutory boundaries and topographic features
Generation of postcode polygons (2) • Boundaries dissolved between adjacent address polygons with common postcode, to form postcode polygons
OA design methodology • Automated zoning procedures derived from Openshaw (1977)… • Variety of alternative approaches • Computationally intensive, iterative search for ‘best’ solution to the zoning problem, given a set of constraints • Not feasible in previous data and computing environments
Output areas by automated zone design Initial Random Aggregation of Building Blocks Iterative Recombination Design Constraints (Contiguity, Thresholds, Shape, Size, Homogeneity) 2001 Output Areas
OA design (1) Initial random aggregation of postcodes into potential output areas
OA design (2) Choose one postcode at random as candidate for swapping into a different output area
OA design (3) Make the swap and evaluate the impact on the overall solution
OA design (4) If swap does not result in an improvement, go back to the previous configuration
OA design (5) Choose another postcode at random as candidate for swapping into another output area
OA design (6) If the swap results in an overall improvement, keep it as part of the solution and examine a new potential swap…
Constraints (1) • Contiguity: output areas from adjacent postcodes (NB problem of stacks) • Thresholds: output areas above population thresholds (NB problem of sub-threshold parishes) • Shape: output areas should be as compact as possible • minimize perimeter2/area
Constraints (2) • Size: output areas should be as uniformly sized as possible - avoiding very large and very small populations • minimize S(OApop-target)2 • Homogeneity: output areas should be as socially uniform as possible • existing ONS tenure-based measure • maximize intra-area correlations
Intra-area correlation • Measures similarity of values within any area of interest (Holt et al., 1996; Tranmer and Steel, 1998) • Higher correlation: greater homogeneity (theoretical maximum of 1.0) • Can be computed for a single category (eg. ‘owner occupied’ or for multi-category variables • Tenure and dwelling type tested in project
Zone design experiments • ONS postcode polygons for test areas • Populated with plausible synthetic populations by iterative sampling of SAR individuals (PCs structured by dwelling type) • Test OAs constructed using alternative combinations of design constraints: (OApop only; OApop+shape; OApop+homog; OApop+shape+homog)
Project website http://www.geog.soton.ac.uk/research/oa2001/
Application to SAM specification • Proposal for small area microdata (SAM) – more spatial, less attribute detail than SARs • Use wards as building blocks, target SAM areas 7-10k population • Same procedures as for postcode to OA • Subsequent splitting of ‘superwards’
Hampshire wards Basingstoke n = 235 mean = 5872 min = 996 max = 15684 Southampton Portsmouth
Hampshire SAM areas @ 5k n = 176 mean = 8230 min = 5035 max = 15684
Hampshire SAM areas @ 15k n = 66 mean = 22835 min = 15170 max = 51368
A new project… • Problem of matching two sets of areal units: • 1991 ED data for 1981 EDs? • 2001 OA data for 1991 EDs? • Various approaches possible: • Individual-level data within Census Offices • Lookup table approximations • Areal interpolation (various) • Which is best matching configuration?
A new project: automated zone matching • More general computational problem: Given two boundary sets and some target zone characteristics, find the optimal match • Can be conceptualized as a modified AZP process (iterative, computationally intensive, general purpose problem) • Automatic tool when no lookup tables etc.
First boundary set • Take a familiar area: • Boundary set A • eg. 1991 EDs A2 A1 A4 A3
Secondary boundary set B2 • For the same area: • Boundary set B • eg. 2001 OAs B1 B3 B5 B4
Full intersection A2B2 • Intersect A and B • Clean topology A1B1 A2B1 A2B3 A1B4 A3B1 A4B5 A3B4
Set up automated zone matching • Set up design criteria: equality of population size, area, density, etc. • Adjust weight for ancillary variable • Set one zone as source which must be maintained (eg. that for which data are available) • Set up initial random aggregation incorporating true matches • Over to (modified) AZP…
Alternative solutions… • Solution 1: perfect match maintaining all zones complete • eg. creation of census tracts O1 = A1+A2+A3 = B1+B2+B3+B4 O2 = A4 = B5
Alternative solutions… O2 = B2+B3 A2 • Solution 2: boundary set B unbroken, closest match to A • eg. creation of lookup tables, local approximations O1 = B1 A1 O4 = A4 = B5 O3 = B4 A3
Conclusions • Major application of geographical technique developed 20+ years ago • Multiple purpose-specific geographies – generated from existing spatial data • Multiple applications of the same approach • Census output areas • SAM areas • Generic geography matching
Demonstrator RSS meeting: Nov 2000