350 likes | 363 Views
This research explores the impact of Small Cell Adjustment (SCAM) on origin-destination data in the 2001 census, particularly on SMS and SWS questions. The study examines the effects of SCAM on different outputs and proposes alternative approaches to address the limitations. The findings reveal clustering effects and raise questions about the accuracy of aggregated data. Possible solutions include providing independent totals, considering asymmetric flows, and reevaluating the provision of data at the Output Area level.
E N D
The effects of small cell adjustment on origin-destination data in the 2001 census Oliver Duke-Williams School of Geography, University of Leeds o.w.duke-williams@leeds.ac.uk
Effects of SCAM • On the SMS • On the SWS / STS
Questions • Does SCAM affect the o-d data? • Does it affect it in a different way to other outputs from the 2001 Census? • What can users do? • Are there any better approaches?
Assumptions about SCAM • Cells with initial values of 1 or 2 are adjusted • A 1 may become 0 or 3, with 0 being more likely • A 2 may become 0 or 3, with 3 being more likely • We can’t distinguish between ‘genuine’ 0s or 3s, and cells that have been adjusted • Cells with initial values of 4 or more remain the same
SMS Level 3 • Output area to output area flows • 1,799,061 flows • Each flow is presented as a single age-by-sex table
SMS Level 3 • Average flow total is 3.58 persons • Distribution of flow totals reveals obvious effects of SCAM
SMS Level 3 • Average flow total is 3.58 persons • Distribution of flow totals reveals obvious effects of SCAM • But… SCAM only affects interior cells
SMS Level 3 • Average flow total is 3.58 persons • Distribution of flow totals reveals obvious effects of SCAM • But… SCAM only affects interior cells • Average of interior cells is 0.60 persons
Is this important? • The averages are affected by large numbers of small flows – are most migrants in large flows? • Number of flows for which all interior cells are equal to 4 or more is…. 2
SMS Level 2 • Ward to ward flows • 1,275,067 flows • Each flow is presented in 5 disaggregate tables • Age by sex • Moving groups • Ethnic group by sex • Moving groups by NS-SEC of group reference person • Moving groups by tenure
SMS Level 2 • Using the two tables of ‘migrants’, the average flow is: • 4.91, according to table MG201 • 4.86, according to table MG203 • Distribution also varies
MG201 MG203
SMS Level 2 • The number of migrants can also be determined by summing across moving group tables • MG202 allows the total to be constructed with the fewest components
Table MG202 • Average flow is 4.62 • Distribution of values is less clustered
MG201 MG203 MG202
SMS Level 1 • ‘District’ to ‘district’ flows • 133,490 flows • Flow total averages are all similar • MG101 – average 46.36 • MG102 – average 46.38 • MG103 – average 46.38 • MG104 – average 46.37 • However, distribution of flows is still clustered
SMS Level 1 • Definition of total migrants with fewest components is using table MG106 • Cells 10 + 11 + 12 + 14 + 15 + 16
Effects on the SMS • Distribution of flow totals are clustered • Does this matter if the data are spatially aggregated?
Level 3 data aggregated to Level 1 Level 2 data aggregated to Level 1 Level 1 data
Effects of SCAM • Problems persist through aggregation of small units to large areas
Effects on SWS • Similar problems arise with the SWS • Pattern of clustering is different
SWS Level 3 • Output area to output area • 5,951,376 flows • Each flow is presented in one table (method of transport to work)
SWS Level 3 • 85% of flows are shown as 3 • This is likely to exacerbate aggregation problems
SWS Level 2 • Ward to ward flows • 2,108,999 flows • Each flow is presented in 6 tables • Average total flow is around 11.5 • 35% of flows have a total of 3
SWS Level 2 • All 6 tables have same definitions • This suggests a possible solution • Use average of all 6 totals, rounding up to nearest integer
Rounded average • The distribution of values looks ‘better’ to users of data • However, the average is in appropriate to many tables: • Too high or too low to generate appropriate rates • Is such a value useful?
Alternatives to small cell adjustment • Provision of independent total • Provision of asymmetric flows • Don’t try to provide OA level data?