USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A FOCUS ON SKID RESISTANCE VALUES

USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A FOCUS ON SKID RESISTANCE VALUES Authors Daniel Emerson;Richi Nayak; QUT Justin Weligamage: QDTMR Presenter Daniel Emerson Computer Science Discipline Queensland University of Technology (QUT)

Project Details • The work for this presentation was conducted as a larger skid resistance – crash analysis as CIEAM I and CIEAM II projects from 2009 -20011 and conducted at QUT. • Project initiators & organizers: Justin Weligamage, Richi Nayak. • Data mining supervisor: Richi Nayak. • Data preparation, data mining & dm strategist : Daniel Emerson • Road engineering advisor: NappadolPiyatrapoomi

Motivation(why the work was done) • Applied data mining as a new approach for analysis with Queensland road & crashes data. • Had found a relationship between the crash risk of roads and their attributes, with skid resistance being significant. (roads having crash). • Sought a higher resolution measure of road crash risk through the crash count method. • Application of crash count data mining models in decision support systems to identifypotential roads for investigation and treatment.

Introduction • This paper presents a data mining case study in which predictive data mining is applied to model the skid resistance & road attributesto predict crashrelationship with the purpose of: • development of models (algorithms) on sample data, • applicationof the models to other data to predict high risk roads.

Data and Data Preprocessing • Several data sources obtained from QDTMR for four year period of 2004 to 2007 include • annual 1 km (or less) road segment snapshots with a list of road variables, • road surface texture depth test readings; seal type and seal age;roadway features, traffic flow, features such as intersections and many others. • dated, skid resistance 100 metre (or less) values representing skid resistance tests F0, • Crash instances, crash details and their road location

Examination of road segment crash count • Meeting our need for a more precise crash measure: crashes per 1km per year.

Crash count characteristics • Road segment crash count showed stability from year to year, indicating its value in crash risk analysis. 1 yr time scale

Clusters: crash count ranges (4yr) • Road segment data mining clusters based on road properties showed characteristic crash counts, thus relating road crash proneness with road properties

Method: Applying predictive data mining Reasons; • To demonstrate that road segment crash count can be modeled, thus establishing a relationship between crash count and roadway features. • Use the rules obtained from the model output in the analytical process to further contribute to understanding of how the roadway features contribute to crash count. • Later apply successful models in decision support.

Method: Applying predictive data mining … using a subset of quality data • Select the target variable to be predicted (crash count). • Select the input variables (road segment attributes). • Select a modelling method (regression tree algorithm). • Run a range of models with varying configurations (regression tree). • Evaluate and understand the results.

Model variables Road attribute input variables (significant order) AVG_FRICTION_AT_60_Ikm (F60 skid resistance) AADT (traffic rates) traffic_percent_heavy lane_count Texture Depth roughness_average rutting_average seal_age seal_type CRASH_SPEED_LIMIT CWAY_TYPE (single, double) CRAS_DIVIDED_ROAD ROAD_TYPE (highway, urban arterial etc) Roadway Feature (roundabouts, bridges, intersections etc) • These road segment attributes were relevant to predicting road segment crash count and became model input variables. Target Variable Road segment crash count

Model results • All models show a high correlation between actual crash count and predicted crash count

Charts of actual value vs. predicted value predicted value • Comparing models with 143 leaves and 83 leaves Actual value

A sample output rule Sample Rule 1. IFAVG_FRICTION_AT_60 < 0.4095 • AND CRASH_SPEED_LIMIT IS ONE OF: 90 100 110 • AND 3987 <= AADT < 6105 • AND CWAY_TYPE EQUALS SINGLE THEN • NODE : 48 • N : 315 …. Number of road segments in the group • AVE : 4.04444 …average crashes for the group • SD : 2.5357 ..standard deviation of the predicted crash values

Conclusion • Road segment crash count can be successfully modelled with road attributes using data mining. • A strong relationship exists between road crash countand road attributes. • Skid resistance plays an important role in determining the crash characteristics of the road segment. • The models may be of sufficient quality to use in decision support. • While the models are specific to Queensland roads, the method can be trialled and evaluated elsewhere.

Future Work • Work with road asset domain experts to analyse the rules, draw conclusions and improve the models. • Apply models for analysis of data subsets, such as crashes with severe human outcomes. • Apply the models to the whole-of-network dataset with the goal of identifying road segments that are skid resistance sensitive, i.e surface intervention to improve skid resistance will result in reduce crash risk.

Acknowledgement • This study is an ongoing investigation into road-crash supported by CIEAM (CRC Asset Management), QDTMR and Faculty of Science and Technology, QUT • Data mining tools used include • SAS (Statistical Analysis Software) • WEKA (Data Mining Software)

Acknowledgement Thanks and Questions Project Publications [1] Nayak, R., Piyatrapoomi, N. and Weligamage, J. (2009). Application of text mining in analysing road crashes for road asset management. Proceedings of the Third World Congress on Engineering Asset Management, WCEAM 2009, ( Athens, Greece, 28-30 September 2009). [2] Nayak, R., Emerson, D., Weligamage, J. and Piyatrapoomi, N.(2010) Using Data Mining on Road Asset Management Data in Analysing Road Crashes. Proceedings of the 16th Annual TMR Engineering & Technology Forum, (Brisbane, July 20, 2010, 2010). [3] Emerson, D., Nayak, R., Weligamage, J. and Piyatrapoomi, N. (2011). Identifying differences in wet and dry road crashes using data mining. (2010). Proceedings of the Fifth World Congress on Engineering Asset Management, WCEAM 2010, ( Brisbane, October 26,2010). [4] Nayak, R., Emerson, D., Weligamage, J. and Piyatrapoomi, N. (2011) Road Crash Proneness Prediction using Data Mining, Proceedings of the EDBT 2011, (Uppsala, Sweden., 2011).

USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A FOCUS ON SKID RESISTANCE VALUES

USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A FOCUS ON SKID RESISTANCE VALUES

Presentation Transcript

Using Horoscopes to Predict Data Provenance

Structural Concrete Innovations: A Focus on Blast Resistance

Using FWD Data to Predict Vibration Sensitive Pavement ...

“Data Mining on a Mushroom Database”

A Parallel Data Mining Package Using MatlabMPI

Missing values problem in Data Mining

1.6 Using Data to Predict

Working with crash data

A Crash Course in CASA With a focus on calibration

Dropout Prevention – Using Data to Predict Student Outcome

Focus Study: Mining on the Grid with ADaM

A survey on using Bayes reasoning in Data Mining

A survey on stream data mining

Data Mining on New Road Prediction

USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A FOCUS ON SKID RESISTANCE VALUES

Data Mining with Big Data

Austroads developments in skid resistance

Using K values to predict reactions between different acids and bases.

Mining Metrics to Predict Component Failures

Using Matrices to Predict Growth

Laboratory test method for the prediction of the evolution of road skid-resistance with traffic

Mining Educational Data to Predict Students' Future Performance using Naïve Bayesian Algorithm