140 likes | 294 Views
Methods of interpolating data to create long-run time series. Ian Gregory (University of Portsmouth) & Paul Ell (Queen’s University, Belfast). Administrative Units in England and Wales from 1801. “Minor” changes: Registration Districts (1840-1910): 400
E N D
Methods of interpolatingdata to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queen’s University, Belfast)
Administrative Units in England and Wales from 1801 “Minor” changes: Registration Districts (1840-1910): 400 Local Govt. Districts (1890s-1972): 4,000 Parishes (1876-1972): 20,000
Creating a standard geography • Areal Weighting: • Assumption – Variable y is homogeneously distributed across the source zones • Using this: • BUT: Very unrealistic assumption.
1. Dasymetric technique: There were 15,000 parishes as opposed to 600/1,500 districts Total population is available at this scale Assumptions: The distribution of y follows the distribution of the total population Parish-level population is homogeneously distributed Problem: Most districts in towns and cities consist of only one parish. 1911, 30% of pop lived in districts that consisted of only one parish Other sources of information (1)
Other sources of information (2) • 2. Data from target districts as ancillary information: • Can provide information on the distribution of source zone data • EM algorithm is used • E.g. • 1. Sub-divide target zones into rural and urban • 2. Assume that rural and urban targets have the same population densities • 3. Allocate y to targets using this assumption • 4. Find the average population density of rural and urban target districts • 5. Go back to stage three using the new population densities and repeat until the algorithm converges • Can use y for the target districts or total population at parish level as ancillary information • Relies on having relevant information for target districts
Other sources of information (3) • 3. Combined technique • Brings together the dasymetric technique and the EM algorithm • Makes use of all available information • Tests all the assumptions
Choice of technique Based on aggregating 1991 EDs to form pseudo-parishes and districts • Conclusions: • No one technique for all variables • Careful choice of technique reduces error significantly • Using regression techniques can help determine which is most appropriate • Error will still be appear in the interpolated data
Predicting error • Possible techniques: • Space – where target zones consist of many large fragments of source zones they are error prone 2. Attribute – error is most prevalent when data have been allocated from urban zones to rural ones 3. Time – error will cause “unrealistic” changes in population
Using population change to locate error Water Orton – parish on the edge of Birmingham 1901-1951, Water Orton (1951: Pop. 1,841, area 2.3km2, pop. den 796 p/km2) 1861-1891, part of Aston: (1891: Pop. 250,000, area 57km2, pop. den 4,300p/km2) 1851, Water Orton: (1851, Pop. 190, area 2.6km2, pop. den 73 p/km2) 1851: Est. Pop: 182 Actual Pop: 190 Pop. Change = (y2-y1)/(y2+y1)
Using population change to locate error Birmingham 1951: Pop. 1,100,000, area 210km2, pop. den. 5,235p/km2 1931: Pop. 1,000,000, area 187km2, pop. den. 5,367p/km2 1891: Pop. 246,000, area 12.2km2, pop. den. 20,123p/km2 1851: Pop. 919, area 0.94km2, pop. den. 977p/km2
Using population change to locate error Castle Bromwich – parish on the edge of Birmingham 1951, Castle Bromwich (1951: Pop. 4,356, area 4.7km2, pop. den 927p/km2) 1921-1931, part of Birmingham: (1931: Pop. 1,000,000, area 187km2, pop. den 5,367p/km2) 1861-1911, part of Aston: (1891: Pop. 250,000, area 57km2, pop. den 4,300p/km2) 1851, Castle Bromwich: (1851, Pop. 6426, area 18.7km2, pop. den 344p/km2)
Conclusions • Can interpolate data to create long-run time-series • Choice of best technique will depend on nature of the variable • No “one size fits all” technique • All techniques will create some error • What to do about error: • Attempt to smooth it out • Explicitly incorporate it into an analysis