330 likes | 511 Views
Traveling Salesman Problems Motivated by Robot Navigation. Maria Minkoff MIT With Avrim Blum, Shuchi Chawla, David Karger, Terran Lane, Adam Meyerson. A Robot Navigation Problem. Robot delivering packages in a building Goal to deliver as quickly as possible
E N D
Traveling Salesman Problems Motivated by Robot Navigation Maria Minkoff MIT With Avrim Blum, Shuchi Chawla, David Karger, Terran Lane, Adam Meyerson
A Robot Navigation Problem • Robot delivering packages in a building • Goal to deliver as quickly as possible • Classic model: Traveling Salesman Problem • Find a tour of minimum length • Additional constraints: • some packages have higher priority • uncertainty in robot’s behavior • battery failure • sensor error, motor control error
Markov Decision Process Model • State space S • Choice of actions aA at each state s • Transition function T(s’|s,a) • action determines probability distribution on next state • sequence of actions produces a random path through graph • RewardsR(s)on states • If arrive in state s at time t, receive discounted rewardgtR(s)forg(0,1) • MDP Goal: policy for picking an action from any state that maximizes total discounted reward
Exponential Discounting • Motivates to get to desired state quickly • Inflation: reward collected in distant future decreases in value due to uncertainty • at time t robot loses power with fixed probability • probability of being alive at t is exponentially distributed • discounting reflects value of reward in expectation
Solving MDP • Fixing action at each state produces a Markov Chain with transition probabilities pvw • Can compute expected discounted rewardrv if start at state v: rv = rv + Swpvwgt(v,w)rw • Choosing actions to optimize this recurrence is polynomial time solvable • Linear programming • Dynamic programming (like shortest paths)
Solving the wrong problem • Package can only be delivered once • So should not get reward each time reach target • One solution: expand state space • New state = current location past locations (packages already delivered) • Reward nonzero only on states where current location not included in list of previously visited • Now apply MDP algorithm • Problem: new state space has exponential size
Tackle an easier problem • Problem has two novel elements for “theory” • Discounting of reward based on arrival time • Probability distribution on outcome of actions • We will set aside second issue for now • In practice, robot can control errors • Even first issue by itself is hard and interesting • First step towards solving whole problem
Discounted-Reward TSP Given • undirected graph G=(V,E) • edge weights (travel times) de ≥ 0 • weights on nodes (rewards) rv ≥ 0 • discount factor (0,1) • root node s Goal find a path P starting at s that maximizes total discounted reward (P) = v Prv dP(v)
Approximation Algorithms • Discounted-Reward TSP is NP-complete (and so is more general MDP-type problem) • reduction from minimum latency TSP • So intractable to solve exactly • Goal: approximation algorithm that is guaranteed to collect at least some constant fraction of the best possible discounted reward
Related Problems Goal of Discounted-Reward TSP seems to be to find a “short” path that collects “lots” of reward • Prize-CollectingTSP • Given a rootvertex v, find a tour containing v that minimizes total length + foregone reward(undiscounted) • Primal-dual 2-approximation algorithm [GW 95]
k-TSP • Find a tour of minimum length that visits at least k vertices • 2-approximation algorithm known for undirected graphs based on algorithm for PC-TSP [Garg 99] • Can be extended to handle node-weighted version
Mismatch Constant factor approximation on length doesn’t exponentiate well • Suppose optimum solution reaches some vertex v at time tfor reward gtr • Constant factor approximation would reach within time 2tfor reward g2tr • Result: get only gt fraction of optimum discounted reward, not a constant fraction.
Orienteering Problem Find a path of length at most D that maximizes net reward collected • Complement of k-TSP • approximates reward collected instead of length • avoids changing length, so exponentiation doesn’t hurt • unrooted case can be solved via k-TSP • Drawback: no constant factor approximation for rooted non-geometric version previously known • Our techniques also give a constant factor approximation for Orienteering problem
Our Results Using -approximation for k-TSP as subroutine • (3/2+2)-approximation for Orienteering • e(3/2+2)-approximation for Discounted-Reward Collection • constant-factor approximations for tree- and multiple-path versions of the problems
Our Results Using -approximation for k-TSP as subroutine substitute =2 announced by Garg in 1999 • (3/2+25-approximation for Orienteering • e(3/2+13-approximation for Discounted-Reward Collection • constant-factor approximations for tree- and multiple-path versions of the problems
Eliminating Exponentiation • Let dv = shortest path distance (time) to v • Define the prize at v as pv=gdv rv • max discounted reward possibly collectable at v • If given path reaches v at time tv, define excessev=tv–dv • difference between shortest path and chosen one • Then discounted reward at v is gevpv • Idea: if excess small, prize ~ discounted reward • Fact: excess only increases as traverse path • excess reflects lost time; can’t make it up
Optimum path s • assume g = ½ (can scale edge lengths) Claim: at least ½ of optimum path’s discounted reward R is collected before path’s excess reaches 1 0 0.5 • Proof by contradiction: • Let u be first vertex with eu≥ 1 u 1 0 • Suppose more than R/2 reward follows u 1.5 0.5 • Can shortcut directly to u then traverse • the rest of optimum 2 1 • reduces all excesses after u by at least 1 • so “undiscounts” rewards by factor g-1= 2 3 2 • so doubles discounted reward collected • but this was more than R/2: contradiction
New problem: Approximate Min-Excess Path • Suppose there exists an s-t path P* with prize value of length l(P*)=dt+e • Optimization: find s-t path P with prize value ≥ that minimizes excessl(P)-dt over shortest path to t • equivalent to minimizing total length, e.g. k-TSP • Approximation: find s-t path P with prize value ≥that approximates optimum excess over shortest path to t, i.e. has length l(P) = dt + ce • better than approximating entire path length
Using Min-Excess Path • Recall discounted reward at v is gevpv • Prefix of optimum discounted reward path: • collects discounted reward S gevpvR/2 spans prize S pvR/2 • and has no vertex with excess over 1 • Guess t = last node on opt path with excess et 1 • Find a path to t of approximately (4 times) minimum excess that spans R/2 prize (we can guess R/2) • Excesses at most 4, so gevpv pv/16 discounted reward on found path R/32
Solving Min-Excess Path problem Exactly solvable case: monotonic paths • Suppose optimum path goes through vertices in strictly increasing distance from root • Then can find optimum by dynamic program • Just as can solve longest path in an acyclic graph • Build table • For each vertex v: is there a monotonic path from v with length l and prize p?
Solving Min-Excess Path problem Approximable case: wiggly paths • Length of path to v is lv = dv + ev • If ev>dv then lv>ev>lv/2 • i.e., take twice as long as necessary to reach v • So if approximate lv to constant factor, also approximate ev to twice that constant factor
r Approximating path length • Can use k-TSP algorithm to find approximately shortest s-t path with specified prize • merge s and tinto vertexr • opt path becomes a tour • solve k-TSP with root r s t • “unmerge”: can get one or more cycles • connect s and t by shortest path
Decompose optimum path monotone monotone monotone wiggly wiggly Divides into independent problems > 2/3 of each wiggly path is excess
Decomposition Analysis • 2/3 of each wiggly segment is excess • That excess accumulates into whole path • total excess of wiggly segment excess of whole path • total length of wiggly segments 3/2 of path excess • Use dynamic program to find shortest (min-excess) monotonic segments collecting target prize • Use k-TSP to find approximately shortest wiggles collecting target prize • Approximates length, so approximates excess • Over all monotonic and wiggly segments, approximates total excess
Dynamic program for Min-Excess Path • For each pair of vertices and each (discretized) prize value, find • Shortest monotonic path collecting desired prize • Approximately shortest wiggly path collecting desired prize • Note: polynomially many subproblems • Use dynamic programming to find optimum pasting together of segments
Solving Orienteering Problem: special case s • Given a path from s that • collects prize P • has length D • ends at t, the farthest point from s 0 0.5 1 • For any const integer r1, there • exists a path from s to somev with • prize P/r • excess (D-dv)/r 1.5 2 1 v 3 t
Solving Orienteering Problem s General case: path ends atarbitrary t • Let u be the farthest point from s • Connect t to s via shortest path • One of path segments ending at u • has prize P/2 • has length D Reduced to special case • Using 4-approximation for Min-Excess Path get 8-approximation for Orienteering t u
Budget Prize-Collecting Steiner Tree problem Find a rooted tree of edge cost at most D that spans maximum amount of prize • Complement of k-MST • Create Euler tour of opt tree T* of cost 2D • Divide this tour into two paths starting at root each of lengthD • One of them contains at least ½ of total prize • Path is a type of tree • Use c-approximation algorithm for Orienteering to obtain 2c-approximation for Budget PCST
Summary • Showed maximum discounted reward can be approximated using min-excess path • Showed how to approximate min-excess pathusing k-TSP • Min-excess path can also be used to solve rooted Orienteering problem (open question) • Also solves “tree” and “cycle” versions of Orienteering
Open Questions • Non-uniform discount factors • each vertex v has its own v • Non-uniform deadlines • each vertex specifies its own deadline by which it has to be visited in order to collect reward • Directed graphs • We used k-TSP, only solved for undirected • For directed, even standard TSP has no known constant factor approximation • We only use k-TSP/undirectedness in wiggly parts
Future directions • Stochastic actions • Stochastic seems to imply directed • Special case: forget rewards. • Given choice of actions, choose to minimize cover time of graph • Applying discounting framework to other problems : • Scheduling • Exponential penalty in place of hard deadlines