260 likes | 471 Views
Statistical modelling challenges – approaches used in Network Rail. Julian Williams, Network Rail. 29 th November, 2011. 17 th July 2008. 20,000 miles of track. 40,000 bridges and tunnels. 800 signal boxes. A few key facts. 2,500 stations. Largest private landowner in the UK
E N D
Statistical modelling challenges – approaches used in Network Rail Julian Williams, Network Rail 29th November, 2011 17th July 2008
20,000 miles of track 40,000 bridges and tunnels 800 signal boxes A few key facts 2,500 stations Largest private landowner in the UK Largest purchaser of electricity in the UK Delivered by a team of 33,000 people
Route utilisation, output & funding specification Asset Monitoring Policies and review & Standards Enablers: Asset information Analysis tools Competencies Processes Route Work asset execution management plans Route delivery plans Asset policies / strategies
Asset Strategy 10-Stage Process • Asset description – how many, where, construction types • Asset history – construction types, investments, condition, performance • Asset criticality – by cost, performance, safety • Route criticality – segmentation of routes • Asset degradation – condition, performance relationships • Interventions – effectiveness, unit costs • Investment scenarios – different strategies, targets • Models – strategic, WLCC, tactical • Assessments – volumes, costs, condition, performance, sustainability • Policy selection – chosen option, final policy
Statistical challenges • Incomplete / inaccurate databases • Inter-database incongruence • Understanding trends • Understanding degradation / intervention effectiveness • Relating condition to performance • Validation • Uncertainty
Challenge 1: Database accuracy • Large number of databases are required covering the whole network • Assessment of impact of incomplete and inaccurate databases used to support investment planning • Two-part confidence grading: • Source data management (A – D) • Accuracy (1 – 6) • Overall contributions weighted according to impact on investment plans for CP5 – formal system developed to derive the weights • Checks on sample data from databases • Number of samples based on required statistical confidence in the level of accuracy
Challenge 2: Database incongruence • Need to match asset registers, renewals & maintenance works, and condition & performance measures • Inconsistent: • Formats (e.g. single text filed vs individual numeric fields) • Asset hierarchies (e.g. point operating equipment vs signalling interlocking vs mileage vs GPS) • Database engines (ORACLE, Access, Excel, text) • A lot of effort goes into matching data – not always successful • Have to account for bias in matched vs unmatched data • Track data segmentation: • Asset registers (various), traffic, track structures, track geometry, rail defects, faults, train delays, planned renewals, maintenance
Challenge 2 Example: S&C rail defects • Rail defects on switches and crossings (S&C) • Recorded in rail defect management system (RDMS) as S&C defect • Clearly must have occurred on an S&C! • Need to match to the S&C in the asset register
Challenge 3: Understanding trends • Inspection • Have the guidelines changed on condition rating? (e.g. what is a “serious” defect) • Has training improved? • What is the variance between inspectors? • Is there a bias in the inspection frequency (e.g. more critical locations, assets with poor previous rating)? • Maintenance • Maintenance frequency (proactive vs reactive, RCM, campaigns) • Maintenance tools (stoneblowers vs tampers) • Workforce competence (experience, training) • Asset condition • Asset specification (e.g. better sleepers) • Asset condition • Utilisation
Challenge 3 example 1: rail defects • Bolt hole defects • Reduction in the number of joints • Better rail end management • Tache ovales • Removal of older rail • Better guidance and training
Challenge 4: Average degradation curves • What happens “on average” does not represent what happens on the ground • Spurious correlations • Data averaged too much • Not all information known • Possibly just an estimate of asset installation date • Unseen biases in the data • Missing “minor” interventions • Different maintenance policies • Need to retain variance in degradation in the models • Use Markov probability models • Link to location specific condition history (if available)
Challenge 4 example: track geometry • “Average” record looks smooth
Challenge 4 example: track geometry • Shape is different for a specific track section
Challenge 4 example: track geometry • Some tracks behave worse than others
Challenge 5: performance • Predicting failure rates • Helps understand impact of changing asset type / condition • Required for whole lifecyle cost analysis • Failure database designed to manage failure repair not for failure analysis • Filled in by operators, rather than engineers • Root cause is a text field, rather than drop down • Asset hierarchy is variable • Analysis only viable for total failures, rather than root cause • Less confidence that performance improvement will follow expected changes in asset condition
Challenge 5 example: track failure rates • Regression analysis to identify several indicators: • Tonnage • Track geometry • Defect rate • Jointed track • S&C density • Can get good regression statistics with “wrong” relationships, due to correlations with other variables • Failure rate decreases with more jointed track: jointed track is on low tonnage lines, so this becomes a correction factor for the tonnage relationship • Need to adjust so that use jointed tonnage-km and CWR tonnage-km • Need to understand the relationships
Challenge 6: Validation • Many relationships based on • Expert judgement (formal elicitation) • Limited data (initially) • Validation • Data gathering and analysis • Benchmarking with other industries • Data sometimes sufficient to show whether in the “right ballpark” • Difficult to validate with good statistical confidence • Often shows have not identified all the important parameters • Some relationships look completely random, partly due to lack of good data • Need new data programmes • Specific survey campaigns to collect data (e.g. rail pad condition) • New mandated fields for some databases (e.g. fault management system) • Guidance to maintenance staff on records
Challenge 6 example: annual variability • Number of rail defects vary from year to year on the same track: • As they are repaired and the assets renewed (accounted for) • Number of inspections (not accounted) • Random variation (not accounted) • Validation has to account for this
Challenge 6 example: no validation with poor data • Sometimes the data provides no support at all
Challenge 7: Uncertainty • For individual relationships: • Asset information • Degradation rates • Intervention impact • Costs • “Optimal” policy • Large uncertainty in overall result, but could have small uncertainty in differential between policies • Confidence limits for expected expenditure required to achieve targets • Probability of meeting targets given expenditure • Break down uncertainty into: • Poor data • Inaccurate models • Natural variability and random events • Monte Carlo analysis used for WLCC models and Tier 0 model • Requires estimate of individual parameter uncertainty • Estimate of most important contributors to uncertainty to guide further data analysis / model developmebnt • Rarely based on good statistical analysis • Parameter correlations hard to estimate and often ignored • Can only address part of the problem: does not account for inaccurate models • Bayesian analysis • Model estimate is the “prior” • Other evidence used to create a posterior
Challenge 7 example: identifying contributors to uncertainty • Identify biggest contributors to uncertainty • Dependent on correct parameter ranges • Correlations important