450 likes | 551 Views
Challenges in Business Analytics An industrial research perspective 30 O ctober 2008. Eleni Pratsini Manager Mathematical and Computational Sciences IBM Zurich Research Laboratory. Alexis Tsoukiàs. Fred Roberts. Overview. Business challenges Analytics opportunities
E N D
Challenges in Business AnalyticsAn industrial research perspective30 October 2008 Eleni Pratsini Manager Mathematical and Computational Sciences IBM Zurich Research Laboratory Alexis Tsoukiàs Fred Roberts
Overview • Business challenges Analytics opportunities • Need for advanced analytics in a resource constrained world • Technical challenges in a world of increasing complexity (see slide 8 for information) • Data availability – a wonder and a curse • Optimization under data uncertainty • Parallel computing and optimization
A couple of slides here describing the business environment that leads to the need for advanced analytics • charts from some of our studies showing the trends
Analytics Landscape How can we achieve the best outcome including the effects of variability? Stochastic Optimization How can we achieve the best outcome? Optimization Predictive modeling What will happen next? Analytics Simulation What will happen if … ? Forecasting What if these trends continue? Competitive Advantage Alerts What actions are needed? Query/drill down What exactly is the problem? Ad hoc reporting How many, how often, where? Reporting What happened? Standard Reporting Degree of Complexity Based on: Competing on Analytics, Davenport and Harris, 2007
Analytics Landscape How can we achieve the best outcome including the effects of variability? Stochastic Optimization Prescriptive How can we achieve the best outcome? Optimization Predictive modeling What will happen next? Simulation What will happen if … ? Predictive Forecasting What if these trends continue? Competitive Advantage Alerts What actions are needed? Query/drill down What exactly is the problem? Descriptive Ad hoc reporting How many, how often, where? What happened? Standard Reporting Degree of Complexity Based on: Competing on Analytics, Davenport and Harris, 2007
Overview • Business challenges Analytics opportunities • Need for advanced analytics in a resource constrained world • Technical challenges in a world of increasing complexity (see slide 8 for information) • Data availability – a wonder and a curse • Optimization under data uncertainty • Parallel computing and optimization
Outline of slides that follow • Intro on increasing complexity of the problems we want to address (slides 9-10) • Use a project in Pharmaceutical manufacturing to introduce some of the technical challenges we face: • Introduce / motivate problem (slides 11-12) • Structure / architecture of the analysis (slides13-14 – here I can highlight the need to work with other functional groups) • Data integration (I will talk about it on slide 14 - I am looking for some chaotic spreadsheets we had to use to extract the information – it is confidential so I need to see on this) • Effect of uncontrolled data collection (slide 15) leading into: • Use of techniques from Decision Theory / AI (slide 16 is taken from article by Fred – this can come in the discussion by Fred & Alexis) • Dealing with missing, inconsistent, incomplete data etc (slides 17-19; more slides if needed) • Use expert elicitation techniques • method of aggregation • Optimization model & optimization under uncertainty (slides 20-37; condense if needed) • Was the client (or the situation?) ready for the analysis (slides 38-39) at the end they simply simulated predefined scenarios!! • Slide 40: some of the non so technical challenges we have in “selling” our work. This will basically be a discussion – I want to find a nice cartoon on this • Slides 41-42 a description of one of our most successful projects and some discussion why it was successful • Finally, if appropriate I can present some exploratory research on using // processing in optimization (currently we have better results for MINLP than MIP)
Challenges in a world of increasing complexity • Advances in computer power bring capabilities (and expectations) for solving larger problem sizes • Problem decomposition not always obvious • Sequential optimization • Exact vs approximate solution or mixed • Hybrid methods • Information Explosion • There is a lot of data (too much!) but how about data quality? • Not enough of the “right” data • Missing values • Outdated information • Data in multiple formats • Can our optimization models handle the information?
Example: Risk Based Approach to Manufacturing ‘FDA’s GMPs for the 21st Century – A Risk-Based Approach’ • Manage risk to patient safety – “Critical to Quality” • Apply science in development and manufacturing • Adopt systems thinking – build quality in *Horowitz D.J. Risk-based approaches for GMP programs, PQRI Risk management Workshop, February 2005
Risk Management and Optimization data industry client client experts industry / experts Phase 1: Identify Risk Sources Phase 2: Quantify Risk Values Phase 3: Optimal Transformation Using company’s knowledge, industry expertise and industry guidelines & regulations we identify the sources of risk that need to be considered in the analysis. Based on historical data, industry expertise and other sources of causal relationships, we set the causal link strengths (conditional probabilities) and develop a model for risk quantification Given the risk values, financial and physical considerations, we determine the best sequence of corrective actions for minimizing exposure to risk and optimizing a given performance metric.
D = Business Case Hazards Constraints Mitigation Actions Current State Sets: Risk Parameters Cost Parameters Revenue Parameters Decision Variables: Risks Status Optimal RouteMap New State Objective Function: Total Rev – Mitigation_Costs - Rev@Risk Iterative process Refactoring Model MIP Formulation Solver Optimisation Engine
Overview of Approach Envisioning supported by Re-factoring model Data Collection Implementation Strategy Establish Risk Profile Define Scenarios Product/Asset Configuration Adapt scenarios Optimise route-map for implementation based on Impact on: Revenue @ Risk profile Investment Strategy Impact of Change 5 Products 1 Review route maps select finalscenario options Input Systems Input 2 Refactoring Model Technology 3 Select scenarioDefine implementation strategy Output Route map Review Cost base of Sites Collect key data 6 4 Test scenarios: Prioritise on impact on Revenue Risk, and Profitability Calculate Revenue profile Assess readiness for change Input
Effect of uncontrolled data collection • A pharmaceutical firm is assessing the risk of two of their sites: Country A and Country X. Engineers at both sites are asked to collect non-compliant and defective batches in the last two years. Analysis of the data indicates that site A has 50% more defective / non-compliant batches than site X. They deduce that they should move their manufacturing processes to X or drastically change their operations at A. • Just before launching an audit of their site A, they receive a letter from the FDA mentioning many recalls of batches all coming from site X. • Further investigation indicates that site X does not report the many defects they notice every week and they are indeed more risky. Lack of observations does not mean no events; it means no reporting of events Extract knowledge from non statistical sources, e.g. experts
Introduce Decision Theoretic / AI approaches to tackle Business Optimization problems* • Decision Theoretic methods of concensus • Combining partial information to reach decisions • Algorithmic decision theory • Sequential decision making models and algorithms • Graphical models for decision making • AI and computer-based reasoning systems • Information management: representation and elicitation • Aggregation • Decision making under uncertainty or in the presence of partial information *Computer Science and Decision Theory, Fred Roberts, DIMACS Center, Rutgers University
Technology, product and system failure are all linked via common root causes. The risk model starts from building a dependency graph of all these root causes. Cause Effect Analysis • industry Expertise • Client Constraints • Past Experiences Different sub-models created: • Dosage form variability • Employee execution • Process control • Technology level
Based on location, technologies or any previously identified components involved in the drug production process, the network model computes an estimate of the quality risk. Changing the way a drug is produced will (a priori) induce a change in the quality risk estimate. Site A Quality failure: 2.4% Optimization model Site B Quality failure: 1.9%
Risk Exposure • Performance measure for exposure to risk: (Insurance premium) • Risk value:
Formulation MAX ST Capacity, Risk Transfer State, Level Remediation
Problem characteristics • 17 product families • 3 systems plus one for subcontracting • 14 technology platforms – most available in all systems • Planning horizon: 5 years split into 10 periods of 6 months • Restriction on rate of change • Statistical analysis gave risk values based on historical data • Revenue projections available • Aggregated costs per scenario – later disaggregated • Simulation of given scenarios (ending result); optimization
Optimal sequence of actions for minimizing risk exposure 4.500 Mean=7208 Mean= 4.000 3.500 3.000 2.500 Values in 10^ -4 2.000 1.500 1.000 0.500 0.000 3 3 5 5 7 7 9 9 11 11 Values in Thousands 5% 90% 5% 5.7013 8.625 Calculate Business Exposure Example Analysis Scenario Comparison
Data Variability • The consideration of the frequency distribution of each solution is essential when there is uncertainty in the input parameters • Robust Optimization techniques to determine the solution that is optimal under most realizations of the uncertain parameters
Possible Approaches • Stochastic Programming • Average Value certainty equivalent • Chance constraints (known CDF – continuous) • Robust Optimization • Soyster (1973) • Bertsimas & Sim (2004) • CVaR • Rockafeller & Uryasev (2000)
Modeling with the approach of Bertsimas and Sim • Formulation adjusted to incorporate a constraint that could be violated with a certain probability: • Uncertain parameters have unknown but symmetric distributions • Uncertain parameters take values in bounded intervals: • Uncertain parameters are independent
Bertsimas & Sim cont’d • Provided guarantee: • Example Values of Γ:
Application to pharma example • Robust Formulation:
Observations • Provides solutions that are much more conservative than expected • Plans satisfying risk requirement are discarded by the optimization - suboptimal solutions • Adjustments to calculation of parameters to improve results • Extreme distribution • Consider the number of expected basic variables and not the total number of variables affected by uncertain coefficients • e.g. production of 100 products, 3 possible sites, uncertainty in capacity absorption coefficient: • 300 possible variables (value of n) • each product can only be produced on one site i.e. only 100 variables > 0 (use this value for the calculation of Γ)
Pharma Regulatory Risk binary variables Bertsimas & Sim approach Production – Distribution continuous variables
Similar work • Robust optimization Models for Network-based Resource Allocation Problems, Prof. Cynthia Barnhart, MIT • Robust Aircraft Maintenance Routing (RAMR) minimizing expected propagated delay to get schedule back on track (taken from Ian, Clarke and Barnhart, 2005) • Among multiple optimal solutions, no incentive to pick the solution with “highest” slack • Relationship of Γ (per constraint) to overall solution robustness? • Comparison with Chance constrained Programming • Explicitly differentiates solutions with different levels of slack • Extended Chance Constrained Programming • Introduce a “budget” constraint setting an upper bound on acceptable delay • Improved results
CVaR • -Quantile αβ(x) (=95%) • E [ f (x) | f (x) > αβ(x)] • Advantages • Sub-additive • Convex Density off (x) CVaR(x) αβ(x) Value at Risk & Conditional Value at Risk VaR • Maximum loss with a specified confidence level • Non sub-additive • Non-convex Density of f (x) 1- -Quantile
CVaR in Math Programming: Linear Program with Random Coefficients
Reformulation of Rockafellar & Uryasev (2000) • Good news: Fully linear reformulation, so presentable to any LP solver • Bad news: New variables, and New constraints = O (Number of Samples) • Reformulated LP has grown in both dimensions! • Large number of CVaR constraints & large number of samples => Very Large LP Reformulation Linear Reformulation
Observations • Number of scenarios necessary for a good estimation of CVaR(α, x) and thus meaningful results? • >20,000 scenarios necessary for stable results (Kuenzi–Bay A. and J. Mayer. ”Computational aspects of minimizing conditional value–at–risk”,NCCR FINRISK Working Paper No. 211, 2005). • Can handle up to 300 scenarios – computational complexity
Recent advances in CVaR Kunzi-Bay & Mayer (2006) presented a 2-stage interpretation if CVaR appears only in the Objective Function & their algorithm CVaRMin led to an order of magnitude speed-up Interesting work on Robust Optimization techniques & CVaR • David B. Brown, Duke University: • Risk and Robust Optimization (presentation) • Constructing uncertainty sets for robust linear optimization (with Bertsimas) • Theory and Applications of Robust Optimization (with Bertsimas and Caramanis) P. Huang and D. Subramanian. ”Iterative Estimation Maximization for Stochastic Programs with Conditional-Value-at-Risk Constraints”. IBM Technical Report, RC24535, 2008 -They exploit a different linear reformulation, motivated by Order Statistics, and this leads to a new and efficient algorithm. -The resulting algorithm addresses CVaR in both the Objective Function, as well as Constraints
The fastest traveling salesman solution* *http://xkcd.com/
Scenario Comparison Practical Analysis …. At the end the client wanted to simulate his various scenarios!
Challenges in selling our work • Need user friendly interfaces • Difficulty in communicating with the client (we speak different languages) • We often build the world’s most expensive calculators (i.e. our sophisticated tools are not used to their fullest capabilities) • IBM specific: we have better acceptability internally
Inventory Optimization Tool • Successful example of a tool developed over the course of 10 years • Strong software architecture • Gui • Interactive optimization / simulation algorithms • Adaptive modeling • Continuously being evolved based on new client projects
Adaptive model • Inventory Optimization (DIOS) Change in parameters, franco limit, logistics requirements, … Continuous forecasting & adaptive time bucketing Model adaptation
Example of MINLP and the use of parallel processing • Present a few slides showing some exploratory results in // processing of MINLP (a bit of explanation why the linear case does not work as well when using parallel processing, while the NL works better!)