750 likes | 769 Views
COM3: Data-driven optimization via search heuristics Day 1. Dr. Richard Allmendinger richard.allmendinger@manchester.ac.uk. The Jyväskylä Summer School, 7th – 18th August, 2017. Who am I?. Name: Richard Allmendinger
E N D
COM3: Data-driven optimization via search heuristics Day 1 Dr. Richard Allmendinger richard.allmendinger@manchester.ac.uk The Jyväskylä Summer School, 7th – 18th August, 2017
Who am I? • Name: Richard Allmendinger • Current position: Lecturer in Data Science at the Alliance Manchester Business School, University of Manchester (UoM), Manchester, UK • Appointment prior to that…: • Diplom in Business Engineering at KIT • Research visit at the RMIT • PhD in Computer Science at UoM • Postdoc at the Biochemical Engineering Department at UCL
Who am I? Research: Development Research: Development and application of search heuristics and machine learning techniques to real-world problems including • Manufacturing process design • Resource allocation problems • Sequential experimentation • Bidding for product ads • Healthcare analytics • Risk analysis • Pattern recognition in music • Football analytics
Who am I? Research: Development Research: Development and application of search heuristics and machine learning techniques to real-world problems including • Manufacturing process design • Resource allocation problems • Sequential experimentation • Bidding for product ads • Healthcare analytics • Risk analysis • Pattern recognition in music • Football analytics
My ancestry and early days Taschkent Berlin Wall Swabian Alb Karlsruhe WW2
And now tell us something about yourself • Name • Where and what you study/work on • Your expectations from this module/topics you hope to be covered
Agenda for today Morning session • Scope of COM3: Data-driven optimization via search heuristics • Whetting your appetite • Course goals • Course assessment • Timetable • Optimization basics Afternoon session • Introduction to assignment • Starting assignment in groups
Agenda for today Morning session • Scope of COM3: Data-driven optimization via search heuristics • Whetting your appetite • Course goals • Course assessment • Timetable • Optimization basics Afternoon session • Introduction to assignment • Starting assignment in groups
Any idea how to solve these problems? Example problems • Imagine a very good friend from Australia visits you and she wants to visit all 485 breweries in London during her one week stay. • Is this feasible? If yes, which route to take? • The shortest certainly helps • Imagine you are going for a 2 week trip to California and need to decide which items to pack in your limited-sized backpack. Each item has a weight and a utility. • Is this feasible to pack all items? If no, how do you decide which items to pack? • Ranking your items in terms of weight and/or utility may help • What if you have data about your past trips and London weather
How to solve these problems? • Many approaches are available • Full enumeration (brute force) probably not possible • You may be able to eliminate certain partial tours or items through careful reasoning and data analysis/integration • Another intuitive approach: start with some good guess and then try to improve it iteratively The latter is an example of a heuristic approach to optimization
How to solve these problems? Optimization refers to choosing the best element from some set of available alternatives Optimization problems… • arise in a wide variety of applications • arise in many different forms, e.g., continuous, combinatorial, multi-objective, stochastic, etc. • we will touch upon many of these in this course • range from easy to hard ones • we focus on the hard ones
…an easy optimization problem Select the most useful 3 items to put in your backpack (a simplified) knapsack problem 5 1 x indicates the utility of an item; the greater x the more useful the item x 2 4 5 7 • How many combinations of items exist? • How would you solve this problem? 3 2 2 1
…a more difficult optimization problem Given a set of objects with a given weight and value (profit), find a subset of objects whose overall mass is below a certain limit (15kg here) and maximizing the total value of the objects • How many subsets of objects exist? • How would you solve this problem?
…an even more difficult optimization problem Find the shortest round trip through some cities Traveling Salesman Problem (TSP) • Assuming you want to visit k cities, how many different round trips exist? • How would you solve this problem?
A more real-world like problem… TSP is a sub-problem e.g. in vehicle routing problems (VRPs) Past trips, customer requests, traffic data, etc Data-driven decisions, e.g.: • Consider traffic data for estimating travel time • Derive customer priorities using customer data
Real-world problems • Often involve challenging properties, e.g. in the TSP these could be • Access restrictions, time windows • Fuel and proximity to petrol station considerations • Stochastic travel times, new cities arising • In this module, we will sometimes consider simplified models of real-world problems • Useful for understanding algorithmic concepts • Capture sufficiently the real challenge
A step closer to data-driven optimization… Traditionally • Assumed that quality of candidate solutions can be computed using a closed-loop function • Models of decision-making under uncertainty assume perfect information, i.e. accurate values for the system parameters and specific probability distributions for the random variables.
A step closer to data-driven optimization… Data-driven optimization • Closed-loop functions may not be available • Precise knowledge of models is rarely available in practice A decision-making strategy based on erroneous inputs might be infeasible or exhibit poor performance when implemented Data-driven optimization uses historical data and/or observations of the random variables as direct inputs to the optimization problems
A step closer to data-driven optimization… An analytics perspective on (data-driven) optimization: • Descriptive analytics “What is happening?” • Querying, Reporting, Data Capturing, Filtering & Analysis • Predictive analytics “What will likely happen, when and why?” • Statistical Methods (Regression), Forecasting & Data Mining • Prescriptive analytics “What should happen?” • Optimization, Simulation, Quantitative Models Prescriptive analytics suggests decision options on how to take advantage of a future opportunity or mitigate a future risk and shows the implication of each decision option
Examples of data-driven problems Imagine you are a civil engineer and need to determine the optimal number and location of electric vehicle charging stations in a city • Data usage: E.g. use mobility data to divide city into hubs, and then use this information to facilitate decision making Imagine you are data scientist and need to devise a strategy that assigns computational jobs (i.e. pieces of code to run) to capacity-limited clusters • Data usage: E.g. use past job data (durations and CPU requirements of jobs) to understand level of uncertainty in jobs
Examples of data-driven problems Imagine you are a data scientist and need to decide how much to bid for your product advert to be shown on e.g. google shopping or amazon • Data usage: E.g. use past bidding data to divide customers into clusters and then use this information to make targeted bids; update your bidding strategy as new data becomes available
Examples of data-driven problems Imagine you are a biochemical engineer and need to decide which drugs to combine from a given drug library to create a new or more potent drug for a certain condition • No closed-form function to evaluate potency of a drug combination: Use real physical experiments to determine potency of drug cocktails
Examples of data-driven problems Imagine you are an aerodynamic engineer and need to optimize the shape of a formula 1 car to increase corner speed while not losing top speed on the straight significantly • No closed-form function to evaluate new shape of the formula 1 car: Use real physical experiments and/or complex simulations to evaluate quality of new car shape
Challenges in data-driven optimization Typical challenges with (data-driven) optimization problems include • Constraints • Uncertainty about data • Dynamic/online decision making • Multiple conflicting objectives • Limited data/data is expensive to obtain • Huge amount of data • User preferences may need to be accounted for We will cover many of these challenges in this course
Typical questions to ask in optimization • How can we solve the problem? • How long does it take to solve the problem? • Is there a better algorithm or did I find the optimal one?
How to solve (data-driven) optimization problems? • Systematic enumeration (brute force) • Problem specific, dedicated algorithms • Generic methods for exact optimization • Heuristic methods (combined with data pre-processing and/or analytics)
Search heuristics Search heuristics intend to compute efficiently, good solutions to a problem with no guarantee of optimality • range from rather simple to quite sophisticated approaches • inspiration often from • human problem solving • rules of thumb, common sense rules • design of techniques based on problem-solving experience • natural processes • evolution, swarm behaviours, annealing, ... • usually used when there is no other method to solve the problem under given time or space constraints • often simpler to implement / develop than other methods
Goals of this course Provide answers to these questions: • Given a problem statement, how can I derive a formal problem definition and what are the challenging problem features? • Which heuristic methods are available and what are their features? • How can heuristic methods be used to solve difficult problems? • How should heuristic methods be studied and analysed empirically? • How can heuristic algorithms be designed, developed, and implemented?
Data-driven heuristic optimization (DDHO) Operations research DDHO Applications Machine Learning Statistics
Data-driven heuristic optimization (DDHO) field Operations research DDHO Applications Machine Learning Statistics
Lectures and lab sessions Course consists for 4 full days comprised of lectures and lab sessions Lectures: • 3 hours in the morning (9am-12) • A 10min break in between Lunch break (12-1pm) Lab sessions: • 3 hours in the afternoon (1-4pm) • Lab sessions focus on completing the group-based assignment • 5th day (Friday) reserved for completing the assignment (morning session) and group presentations (afternoon session)
Assessment Formative assessment: • Quizzes during lectures • Feedback in lab sessions Summative assessment: • Group-based assignment focussed on developing, implementing and testing algorithms for a real-world optimization problem. • You can use any programming language(s) of your choice • Assessment based on group report on the work carried out + group presentation • Purpose: Experience how to tackle a real-world optimization problem
Course material / literature • Slides are available on the module website • Slides contain references to various literature
Agenda for today Morning session • Scope of COM3: Data-driven optimization via search heuristics • Whetting your appetite • Course goals • Course assessment • Timetable • Optimization basics Afternoon session • Introduction to assignment • Starting assignment in groups
Optimization basics covered • What means optimization formally? • Types of optimization models • Combinatorial optimization • Problem vs problem instances • Decision problems vs optimization problems • Computational complexity • P vs NP • Global vs local optimization
Mathematical optimization Optimization refers to choosing the best element from some set of available alternatives minimum How can we convert a minimization problem into a maximization problem? smallest
Examples Portfolio optimization • variables: amounts invested in different assets • constraints: budget, max./min. investment per asset, minimum return • objective: overall risk or return variance Manufacturing process optimization • variables: ? • constraints: ? • objective: ? Data fitting • variables: ? • constraints: ? • objective: ? 5min in pairs
Examples Portfolio optimization • variables: amounts invested in different assets • constraints: budget, max./min. investment per asset, minimum return • objective: overall risk or return variance Manufacturing process optimization • variables: equipment sizing, order of unit operations • constraints: manufacturing limits, timing requirements • objective: costs Data fitting • variables: ? • constraints: ? • objective: ?
Examples Portfolio optimization • variables: amounts invested in different assets • constraints: budget, max./min. investment per asset, minimum return • objective: overall risk or return variance Manufacturing process optimization • variables: equipment sizing, order of unit operations • constraints: manufacturing limits, timing requirements • objective: costs Data fitting • variables: model parameters • constraints: prior information, parameter limits • objective: measure of misfit or prediction error
Solving optimization problems General optimization problem • very difficult to solve • methods involve some compromise, e.g., very long computation time, or not always finding the solution Exceptions: certain problem classes can be solved efficiently and reliably • least-squares problems • linear programming problems • convex optimization problems Any idea which ones?
Linear programming minimum x • Objective and constraint functions are linear Solving linear programs • no analytical formula for solution • reliable and efficient algorithms and software • a mature technology Using linear programming • not as easy to recognize if a problem is a linear programming one • a few standard tricks used to convert problems into linear programs
Convex optimization minimum
Convex optimization minimum
Non-convex/Non-linear optimization Traditional techniques for general non-convex problems involve compromises Local optimization methods • find a point that minimizes f among feasible points near it (some neighbourhood) • fast, can handle large problems • require initial guess • provide no information about distance to (global) optimum Global optimization methods • find the (global) solution • worst-case complexity grows exponentially with problem size
Mathematical programming vs Heuristics Mathematical programming community • Transform problem into a certain structure • Use of standardized algorithms (e.g. with GAMS) • Algorithms are often based on solving convex sub-problems • Guarantee to find optimal solution • Time-consuming for difficult (high-dimensional) problems Where possible use mathematical programming Heuristic-based optimization community • Deal with the problem as is • Algorithms are often specific and problem-dependent • Goal is to find a “good” solution in “reasonable” time
Classical types of optimization models Many more problem classifications, e.g.: • Deterministic vs stochastic • One or multiple objective functions • Computational vs physical evaluations • Online vs offline optimization Optimization problem Convex Non-convex Linear Non-linear Linear Non-linear … Constrained Unconstrained
Combinatorial problems • Combinatorial problems arise in many areas of computer science and application domains, such as • finding shortest/cheapest round trips (TSP) • knapsack problems • planning, scheduling, time-tabling • vehicle routing • location and clustering • resource allocation • protein structure prediction Combinatorial problems involve finding a grouping, ordering, or assignment of a discrete, finite set of objects that satisfies given conditions.