260 likes | 393 Views
Data Mining to Improve Forecast Accuracy in the Airline Business. Hans Feyen, Christoph Hüglin Atraxis AG CKCB/Data Mining and Analysis CH-8058 Zürich Airport Tel. + 41 1 812 45 45 www.atraxis.com. Outline. What is Data Mining Process Oriented
E N D
Data Mining to Improve Forecast Accuracy in the Airline Business Hans Feyen, Christoph Hüglin Atraxis AG CKCB/Data Mining and Analysis CH-8058 Zürich Airport Tel. + 41 1 812 45 45 www.atraxis.com
Outline • What is Data Mining • Process Oriented • Overview of problems tackled using Data Mining • Forecasting of No Show passengers based on PNR information • Methodology • Preliminary Results • Forecasting of Group Utilisation Ratio’s at time of request • Methodology • Results Data Mining
Result Deployment Business Understanding and Problem Definition Model Evaluation Data Extraction and Understanding Data Modeling Data Preparation and Derivation of attributes Data Mining is the Process of Discovering hidden Information in Data Data Mining maximizes the value of a Data Warehouse Data Mining
Key Success Factors of a Data Mining Project • Business Knowledge Business specialists and Data Mining specialists work together • Communication and Co-operation with Customers Data mining is used to discover patterns and relationships in data in order to help you make better business decisions • Data Handling (Extracting, Deployment) Efficient and system-independent handling (extracting, merging, filtering, aggregating) of distributed data is absolutely necessary • Methodological Knowledge Select the right approach and method for each problem • Data Quality • “Garbage in---Garbage out” Data Mining
Customer benefits of a Data Mining Project Data Mining better exploitation of information Forecast/Prediction Recognition Deviation Improve Airline- / Marketing Profitability better decisions and more targeted marketing-actions Data Mining
Case 1: PNR Based No Show Forecast • Traditionally, No Show rates are forecasted on a segment level using historical flight information. • Probability of a passenger to be a No Show depends on individual passenger behavior. • In our approach, we use passenger (available via PNR) and schedule information to better exploit available data and to obtain more accurate forecasts. Standard Method Our Approach Historical Flights Historical Flights } Booking DW PAX Schedule Info Data Mining
Case 1: PNR Based No Show Forecast • Data Source • We extracted two random samples from a Bookings Data Warehouse • The first Sample served as a training sample to set-up the forecasting models • The second Sample served as test sample to validate the forecasting models • Each contained about 70’000 (historical) segments and had the same (historical) No Show rate Data Mining
Case 1: PNR Based No Show Forecast • Data preparation: Derive additional attributes from PNR information (use of Trip Analyser and OD Builder algorithm) Attributes available from PNR (variable type) Generated Attributes (variable type) Booking time prior to departure (count) Was PNR split during booking history? (binary) Booking class (categorical) Origin region (categorical) Service class (categorical) Destination region (categorical) Number of passengers in PNR (count) Board point region (categorical) Origin City (categorical) Off point region (categorical) Destination airport (categorical) Grouped Board point airports (categorical) Position of segment within OD (count) Board point airport (categorical) Total travel time for OD (continuous) Off point airport (categorical) Total flight time for OD (continuous) Weekday of departure (categorical) Flight time (segment) / Flight time (OD) (categorical) Connection time between segments in OD (categorical) More than one airline used in OD? (binary) Is segment part of round trip? (binary) Does segment belong to return portion of trip? (binary) Purpose of trip (business, leisure, or mixed) (categorical) Total time for trip (continuous) Number of segments in trip (count) Number of scheduled flights per week (count) Data Mining
Case 1: PNR Based No Show Forecast • Exploratory Data Analysis (based on training sample) Data Mining
Case 1: PNR Based No Show Forecast • Exploratory Data Analysis (based on training sample) Data Mining
Case 1: PNR Based No Show Forecast • Data Mining Methodologies • Combined use of Decision Tree algorithms and Logistic Regression • Decision Tree algorithms are very useful to: • Identify which variables influence No-Show probability. • Reduce the number of levels of categorical variables to construct more meaningful new variables. • Identify interactions between variables that are likely to be important terms in a regression model. • … to prepare the data set in such a manner that optimal accuracy can be obtained by the logistic regression. Data Mining
Case 1: PNR Based No Show Forecast • Results of the Modeling • Attributes used in logistic regression model (in sequence of importance): 1. Connection time between segments in OD 2. Booking time prior to departure 3. Segment part of round trip? 4. Purpose of trip 5. Grouped board point airports 6. Return portion of trip ? 7. PNR split indicator 8. Number of segments in trip 9. Origin region 10. Flight time (segment) / Flight time (OD) 11. Number of scheduled flights per week 12. Destination region 13. More than one airline used in OD 14. Booking class 15. Number of passengers in PNR Data Mining
Case 1: PNR Based No Show Forecast • Logistic Regression Results (test sample) 70 60 50 40 no-show frequency (%) 30 20 10 0 0-2 4-6 8-10 12-14 16-18 20-25 30-35 40-50 probability class (%) Comparison of observed and estimated (by the logistic regression model) no-show probabilities. Compared are the observed no-show frequencies by probability classes. Data Mining
Deployment of Forecasted No Show rates Interface from Revenue Management System Flight level: ALC, FLN, DEP Date/Time Comp. level.: Phys. Cap. RBD level: Seats Sold, Constr. Demand Forecast Booking DW Flight selection Filter T2: T1: PNR Data No-Show Forecast Schedule Info RMS RES T3: Hand back to RMS T4: Interface to RMS T4 > T3³ max(T1, T2) + processing time No-Show forecast Data Mining
Case 2: Group Show Up Forecast • Background Information What is a Utilization Rate: (DCP: Data Collection Point) How are currently Utilisation Rate Forecasts made? Applicable records in history are selected based on • customer type, group type, agency, DOW, market O&D, POS, period, days prior... Data Mining
Case 2: Group Show Up Forecast The Life of a Group Request: A minority of requested groups survives All Requested and Accepted Groups 50000 Group ‘Survived’ All Seats filled 40000 30000 Count 20000 10000 Group ‘Survived’ Not all seats filled 0 No Yes Completely Cancelled Groups Completely Cancelled Group ? Data Mining
Case 2: Group Show Up Forecast • Utilisation Rates depends on the view! All Requested and Accepted Groups Only Surviving Groups Observed Utilisation Rate A S A S Group Type (Adhoc vs Series) Data Mining
Case 2: Group Show Up Forecast • Ideally, forecasting Utilization Rates is composed of two steps Forecast Observed STEP1: Forecast whether a Group Survives--or not (at time of request). Both Steps Utilization Rate STEP2: Forecast whether a Seat is filled if the Group Survives. AFRICA BENELUX EU NORTH FEST SWIDO Combined AMERICAS EU EAST EU SOUTH MEST Origin Region Data Mining
Case 2: Group Show Up Forecast • Data Mining Methodologies • Combined use of Decision Tree algorithms and Logistic Regression (as before) • Since the purpose of the project was not the identification of a predictive model but to get a better understanding of factors influencing the Group Show Up rate, we only pulled a training sample. Data Mining
Case 2: Group Show Up Forecast Attributes used in the logistic regression model (in sequence of importance) This model forecasts the utilisation rate at the time of request!! Period (0-23, half-months, expressing seasonality), Origin Region (at time of request) Seats Requested on Master PNR (in categories) Booking Region (from axsWizard, at time of request) Customer Type (leisure, shareholders, etc…) Reservation System (1A, 1G, Others) Days prior (in classes) (0-18, 19-33, 34-47, 48-59, 60-73, 74-155, 156-244, >244) Destination Region, (at time of request) Group Type (Ad-hoc or series) Day Of Week (0-6) Data Mining
Case 2: Group Show Up Forecast Attribute: Seats requested on Master PNR (in categories, percentiles: 0-10, 11-20, 21-30, >30) All Groups Forecast Higher Show-Up rates for smaller requested Groups Observed Both Steps STEP1 Utilization Rate STEP2 Combined PERC_25 PERC_50 PERC_75 PERC_Hi Number of Seats Requested on Master PNR Data Mining
Case 2: Group Show Up Forecast Attribute: Origin Region All Groups Forecast Observed Show-Up rates differ by Booking Region Both Steps STEP1 Utilization Rate STEP2 Combined AFRICA BENELUX EUNORTH FEST SWIDO AMERICAS EUEAST EUSOUTH MEST Origin Region Data Mining
Case 2: Group Show Up Forecast Attribute: Amount of Days Prior to departure a Group Request was made All Groups Forecast Observed Late requests result in higher Show Up figures Both Steps STEP1 Utilization Rate STEP2 Combined LE_18 LE_47 LE_73 LE_244 LE_33 LE_59 LE_155 X_245 Days Prior (in categories, LE_18 means that the request took place 18 days or less before departure) Data Mining
Case 2: Group Show Up Forecast • Hint for further improvement: consider changes in the shells to produce better forecasts 80 60 Utilization Rate 40 20 0 0 2 4 6 8 Number of times a Master PNR is Splitted Data Mining
Conclusions • A Data Mining process provides a structured way of analyzing data with the final purpose of making more accurate forecasts (as in the two presented cases) • Data Mining is best suited to extract more business understanding out of large data sets. Often not a model but knowledge on influence factors is the most important result of a data mining study. • By combining Data Mining methodologies data can be optimally prepared for modeling. Apart of Decision Trees and Logistic Regression, we often use visualisation techniques, Bayesian networks, Neural networks, Self Organizing Maps etc. • We used Data Mining successfully for a wide range of projects: CRM (Segmentation of FFP Members, FFP Tier level Scenario Calculations, Customer Value Prediction), Performance Measurement of Revenue Management Systems (based on Wizard), Clustering of Booking Curves, Identification of Airport Catchments, Yield Monitoring and in E-business applications. Data Mining