240 likes | 379 Views
Statistical Peril in the Transportation Planning Polygon. Aggregated Data – A Planning Reality & A Planning Problem. Aggregation units are required since traffic analysis zones are the convenient grouping scheme for regional and statewide transportation planning.
E N D
Aggregated Data – A Planning Reality & A Planning Problem • Aggregation units are required since traffic analysis zones are the convenient grouping scheme for regional and statewide transportation planning. • Zone-level variables are both consumed on their own and used as inputs to travel demand and land use allocation models, with the assumption that the groupings are real and fixed. • The fundamentals of spatial analysis and statistical sampling error are commonly ignored, which can have undesirable consequences.
Modifiable Areal Unit Problem: The Zone Effect • The sizes and shapes of planning zones are modifiable and arbitrary (they rarely represent real geographical properties or segment the population in a meaningful way). • Changing the polygon boundaries can drastically change the zonal statistics (e.g. Gerrymandering)
Modifiable Areal Unit Problem: The Scale Effect • The scale of the zones will also change the results. • As the polygons get bigger and underlying population grows, variability is washed away. • As the polygons get small and underlying population shrinks, we are more likely to observe extreme (and perhaps unreliable) values. • When we mix scales in a planning region, both statistical properties will be present.
Show a map with New York State Housing Units Block-level: Units per Person
Two Ways to View the Distribution Hous. Units per Person Population in Block
Start with One Polygon • Simulated polygon with population of orange and grey squares. • Color locations are randomly assigned • 20.2% of the zone is orange. • Cut the polygon up and measure the orange within each smaller polygon. 21.3% 14.3% 26.6% 17.3% 18.2% 19.0% 14.3% 14.3% 14.3% 21.1% 22.6% 18.3% 9.5% 21.8% 11.3% 15% 21.4% 16.7% 16.1% 19% 23.8% 16.7% 16.7% 22.6% 18.3% 22.8% 31% 23.8% 21.7% 22.1% 19.9% 19.5%
Look at size before location • Always plot your statistic against its own denominator. • Funnel or cone shapes indicate you may have a scale effect playing a role.
More on Scale – Conventional Guidance on TAZ size According to AASHTO: “…, it is strongly suggested that TAZs should be delineated with a resident or worker population of 1,200 or greater.”
Land Use Model Inputs Employment Density (jobs/acre) Non-residential Developed Acres
Rates of Seatbelt Use Across a State Road Segment Daily Volume
What should you do? • Resist the temptation to explain all the spatial and temporal variability. • For TAZ delineation, optimization routines and explicit testing of varying zone structures have been proposed (Ding, 1998 & Viegas, et al., 2007). • Run simulations on your own planning units to explore the severity of the zone and scale effects. The impacts depend on the measures and the specific region under study. More Tactical Adjustments during Data Exploration • Binomial Data with small n: methods that follow the Law of Succession (Laplace, Wilson, or Jeffreys) are helpful to improve small sample statistics. • For zone-level means, you can center the distribution by using the regional mean as the expected value.
Mapping Polygon Values 2008 Presidential Election Results Mark Newman University of Michigan
Mark Newman University of Michigan
Mark Newman University of Michigan
Mark Newman University of Michigan
Recap and Final Thoughts Rather than ignoring sampling variation, we should recognize its presence. Rather than only asking if the observational differences are a function of location or polygon-specific attributes, consider some or most of the differences could be merely be a function of the base size and your zonal delineation. Real variation due to the underlying spatial phenomenon are often blurred by our unit of analysis. Both aggregation and disaggregation create problems; our job is understand the trade-offs. The least densely populated zones are sometimes the largest. The use of thematic mapping has an unfortunate consequence of overemphasizing large units and minimizing small ones. Consider alternatives that are more honest in their visual representation.
Sources & Further Reading Statistics for Spatial Data. Noel A. Cressie. 1993. Spatial Modeling of Regional Variables. Noel Cressie and Ngai H. Chan. 1986. The Most Dangerous Equation. Howard Wainer. 2007. Diffusion-based method for producing density-equalizing maps. Michael T. Gastner and M. E. J. Newman. 2004. Effects of the modifiable areal unit problem on the delineation of traffic analysis zones. Viegas, et al. 2007. The GIS-Based Human Interactive TAZ Design Algorithm: Examining the Impacts of Data Aggregation on Transportation Planning Analysis. Ding, C. 1998. When 100% Really Isn’t 100%: Improving the Accuracy of Small-Sample Estimates of Completion Rates. James Lewis and Jeff Sauro. 2006. Kevin.Hathaway@rsginc.com