240 likes | 254 Views
This study validates the use of Empirical Bayesian (EB) statistical approach for identifying high crash locations on highways. It explores the effectiveness of EB in comparison to other models, investigates the impact of data aggregation, and examines the relationship between segmentation and accuracy of estimates.
E N D
Validation and Implication of Segmentation on Empirical Bayes for Highway Safety Studies Reginald R. Souleyrette, Robert P. Haas and T. H. Maze Iowa State University, SAIC and Iowa State University ENVIRONMENTAL HEALTH RISK 2007 Fourth International Conference on The Impact of Environmental Factors on Health MALTA; 27 - 29 June, 2007
The highway safety problem Source: World Health Organization
Mitigation approaches – 4Es • Education • Enforcement • Emergency Response • Engineering
Engineering studies • Limited resources • Highest benefit desired • High Crash Locations • Before and After Studies • Small sample size high variance • Selection bias regression to the mean (RTM)
Objectives • Validate the state of the art statistical approach, known as empirical Bayesian • Demonstrate tradeoffs between model quality and data quantity • Investigate effect of data aggregation • … to improve identification and therefore mitigation of high crash locations
Statistical approaches we could take… • Use long periods • Use large number of locations • Use Empirical Bayes (EB) • Substitutes “similar” locations for longer observation time • “Weights” site and similar-site data
Mr. Smith • Mr. Smith had no crashes last year • The average of similar drivers is 0.8 crashes per year • What do we expect is the number of crashes Mr. Smith will have next year … 0?, 0.8? … • Answer … use both pieces of information and weight the expectation Hauer, E., D.W. Harwood, F.M. Council, M.S. Griffith, “The Empirical Bayes method for estimating safety: A tutorial.” Transportation Research Record 1784, pp. 126-131. National Academies Press, Washington, D.C.. 2002 http://members.rogers.com/hauer/Pubs/TRBpaper.pdf
Empirical Bayes (EB) • We have two types of information • We compute an estimate which is an average of both • How much to weight the two depends on… • Quantity • Quality • Accepted practice… small scale What should the weight be???
1 ________ 1+(μ∙Y)/φ w = overdispersion factor weight applied to model estimate number of years mean # crashes/year from model Need: - model for similar sites (neg. binomial) Need: site data EB estimate = w∙(model estimate) + (1-w)∙(site average)
2000 2001 2002 2003 2004 2004 Objective #1 Test effectiveness of EB by comparing: • a single year of data from many locations, with different models and the Empirical Bayes formula, vs. • several years of crash data at specific locations
Objective #2 explore the relationship between segmentation and accuracy of estimates
Description of Data Roads (Iowa) • All (19,400km) • Freeways (1400km) • Multilane (8000km) • 2-lane (10,000km) • Low ADT (1200 VPD) • Med ADT (2400 VPD) • High ADT (4400 VPD) • Segments • 400m (short) • 4km (med) • 6.8km (long)
Description of Data Intersections (California) • Multiphase (873) • Single Phase (374) • Thru-stop (3047) • 5 years of data • large-scale validation
Analysis – Intersections Three model forms: • Crashes = α(mainline traffic)β, • Crashes = α(mainline traffic)β(cross street traffic)γ • Crashes = α (mainline traffic)β(cross street lanes)δ Three types of intersections • multiphase signals • Single phase signals • Stop sign control Intersection model parameters and descriptive statistics
Example intersection crash models (only 2 dimensions shown)
Intersection ResultsTop 10 high crash locations in 2003* Highest in 2003 Trying to predict this 4 year average “better” slightly more often than EB Not intuitive EB model “a” lowest error * California HSIS Multiphase 4 leg
Using 4 years of data + EB Now, model “d” never best estimate, but still best model four times? Now, EB better more often
Intersection ResultsEffect on Ranking all models “comparable” EB does slightly better than 4 year average, or 2003 alone
Analysis – Roads • crashes=α(length)(ADT)β • 3 types of roads • Freeway • Multilane divided • 2-lane • 3 segmentations • 0.4, 3.8, and 11.6 km, on average • 3 traffic ranges (L,M,H) • 15 models Road segment model parameters and descriptive statistics
Effect of Segmentation on Correction Freeway-type segments Shortest segments Average length 0.4 km Longest segments Average length 11.6 km Medium segments Average length 3.8 km Note higher EB correction for short segments
Conclusions • EB+1yr ≈ 4yrs of data • Better model did not necessarily improve prediction (at least for the 10 intersections selected) • Longer segment models are more accurate • Intersection 4-year averages and models are relatively poor predictors • But when combined using EB, better
Thank you reg@iastate.edu