1 / 38

Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal)

Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal). Shuchi Chawla Carnegie Mellon University. Two classes of Graph Optimization problems. Optimization problems on graphs arise in many fields Typically NP-hard

tanika
Download Presentation

Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximation algorithms forPath-Planning and Clusteringproblems on graphs(Thesis Proposal) Shuchi Chawla Carnegie Mellon University

  2. Two classes of Graph Optimization problems • Optimization problems on graphs arise in many fields • Typically NP-hard • We consider two classes of problems motivated by machine learning and AI: • Path-planning – Construct a “good” path, given a map • Clustering – Divide objects into groups based on similarity Shuchi Chawla, Carnegie Mellon University

  3. Path-planning Problems

  4. A Robot Navigation Problem • Task: Deliver packages to certain locations • Faster delivery => greater happiness; “reward” • Want a path with short length and large reward • Classic formulation – Traveling Salesman Find the shortest tour covering all locations • Some complicating constraints • Limited battery power – robot may die before finishing task • Packages have different deadlines for delivery • Preference to the larger reward packages • An alternate formulation – Orienteering • Construct a path of length · D • Visit as many locations (reward) as possible Shuchi Chawla, Carnegie Mellon University

  5. Path-planning in the real-world: Motivation • Given graph (metric) G, construct a path satisfying some constraints and optimizing some function. • Some applications: Robotics Assembly analysis Manufacturing Production planning • A trade-off between time and reward maximize reward with bounded length minimize length with reward quota some combination of both Shuchi Chawla, Carnegie Mellon University

  6. A time-reward trade-off • Impose a reward quota and minimize length • Metric TSP Collect all points • k-Path Collect at least k reward • Budget the path-length and maximize reward • Orienteering Hard bound on path length • Time Window Visit node v within [Rv, Dv] • Optimize a combination of reward and length • Prize Collecting TSP Min (length + reward left) • Discounted Reward TSP max reward; reward decreases with time Shuchi Chawla, Carnegie Mellon University

  7. A time-reward trade-off • Impose a reward quota and minimize length • Metric TSP 1.5 [Christofides 76] • k-Path 2 +  [Chaudhury Godfrey Rao+ 03] • Budget the path-length and maximize reward • Orienteering3 • Time Window3log2n[Bansal Blum C Meyerson 04] • Optimize a combination of reward and length • Prize Collecting TSP 2 [Goemans Williamson 95] • Discounted Reward TSP6.75 + [Blum C Karger+ 03] [Blum C Karger Meyerson Minkoff Lane 03] Shuchi Chawla, Carnegie Mellon University

  8. Orienteering and k-Path • Orienteering : length · D; maximize reward • k-Path : reward ¸ k ; minimize length • Complementary problems • Series of results on k-TSP (related to k-Path) [BRV99] [Garg99] [AK00] [CGRT03] … best approx: (2+) • None for Orienteering until recently! Shuchi Chawla, Carnegie Mellon University

  9. Why is Orienteering difficult? OPT(d) s APPROX • First attempt – Use distance-based approximations to approximate reward • Let OPT(d) = max achievable reward with length d • A 2-approx for distance implies that ALG(d) ≥ OPT(d/2) • However, we may have OPT(d/2) << OPT(d) • Bad trade-off between distance and reward! Shuchi Chawla, Carnegie Mellon University

  10. Why is Orienteering difficult? Min-Excess Path Problem t s APPROX • Second attempt – approximate subparts of the optimal path and shortcut other parts • If we stray away from the optimal path by a lot, we may not be able to cover reward that’s far away • Approximate the “extra” length taken by a path over the shortest path length OPT Shuchi Chawla, Carnegie Mellon University

  11. The Min-Excess Problem excess • Given graph G, start and end nodes s, t, reward on nodes v,target reward k, find a path that collects reward at least k and minimizes(P) =ℓ(P) – d(s,t) • At optimality, this is exactly the same as the k-path objective of minimizing ℓ(P) • However, approximation is different: Min-excess is strictly harder than K-path • We give a (2+)-approximation for Min-Excess [Blum, C, Karger, Meyerson, Minkoff, Lane, FOCS’03] • Our algorithm returns a path with length d(s,t) + (2+) (P) Shuchi Chawla, Carnegie Mellon University

  12. A 3-approximation to Orienteering • Using an r-approx for Min-excess ( r Z+ ), we get an r-approximation for s-t Orienteering Open: Given an r-approx for min-excess (r 2R +), can we get r-approx to Orienteering? v2 t 3 s 1 2 v1 APPROX Excess of one path · (1+2+3)/3 Can afford an excess up to (1+2+3) [Blum C Karger + 03] • There exists a path from s to t, that • collects reward at least  • has length  D • Given a 3-approximation to min-excess: 1. Divide into 3 “equal-reward” parts (hypothetically) 2. Approximate the part with the smallest excess  3-approximation to orienteering Excess of path P (P) = dP(u,v)– d(u,v) OPT Shuchi Chawla, Carnegie Mellon University

  13. The next step: Deadline-TSP [Bansal Blum C Meyerson 04] • Every vertex has a deadline D(v); Find a path that maximizes nodes v visited before D(v) • Arises in scheduling, production planning • If the last node on the path has the min deadline, use Orienteering to approximate the reward • Don’t need to bother about deadlines of other nodes • Does OPT always have a large subpath with the above property? • There are many subpaths of OPT with the above property that together contain all the reward NO! Shuchi Chawla, Carnegie Mellon University

  14. A segmentation of OPT Deadline Time Shuchi Chawla, Carnegie Mellon University

  15. Deadline-TSP • Segment graph into many parts, approximate each using Orienteering and patch them together • How do we find such a segmentation without knowing the optimal path? • In order to avoid double-counting of reward, segments should be node-disjoint • Our result – There exists a segmentation based only on deadlines, such that the resulting solution is a (3 log n)-approximation Open: Is there a segmentation based on other properties (eg. distance from the root), that gives a constant approximation? Shuchi Chawla, Carnegie Mellon University

  16. An overview of our results Problem Approximation References Min-Excess 2+ [FOCS 03] Orienteering 3 [STOC 04] Discounted-Reward TSP 6.75+ [FOCS 03] Deadline TSP 3 logn [STOC 04] Time-Window Problem 3 log2n [STOC 04] reward: log 1/ deadlines: 1+ Time-Window Problem - bicriteria [STOC 04] Shuchi Chawla, Carnegie Mellon University

  17. Future Directions • Better approximations • can we get a constant factor for Time-Windows? • special metrics such as trees or planar graphs • hardness of approximation? • Asymmetric Path-planning • the graph is directed; still obeys triangle inequality • polylog-approximations and lower bounds for distance • need entirely different ideas for asymmetric-Orienteering • is it log-hard? • Group Path-planning • Reward is associated with “groups” of nodes • visit at least one node in a group to obtain reward Shuchi Chawla, Carnegie Mellon University

  18. Future Directions • Stochastic Path-planning • Closer home to Robot Navigation; The graph is a Markov Decision Process • Each edge is an “action” associated with a probability distribution • The goal: • Give a “strategy” to accomplish a given task as fast as possible • Best action could be history dependent • Can we write down the best strategy in polynomial time? • Approximate it in poly-time or even in NP? 0.7 0.2 0.3 0.5 0.1 0.2 Shuchi Chawla, Carnegie Mellon University

  19. Correlation Clustering Coming up next :

  20. Natural Language Processing • In order to understand the article automatically, need to figure out which entities are one and the same • Is “his” in the second line the same person as “The secretary” in the first line? Shuchi Chawla, Carnegie Mellon University

  21. Real-World Clustering Problems • A wide variety of clustering problems • Co-reference Analysis • Web document clustering • Co-authorship (Citeseer/DBLP) • Computer Vision • Typical characteristics: • No well-defined “similarity metric” • Number of clusters is unknown • No predefined topics – desirable to figure them out as part of the algorithm Shuchi Chawla, Carnegie Mellon University

  22. Cohen, McCallum & Richman’s idea Mr. Rumsfield his Strong similarity Strong dissimilarity The secretary he Saddam Hussein “Learn” a similarity measure based on context Shuchi Chawla, Carnegie Mellon University

  23. A good clustering Consistent clustering: edges inside clusters edges between clusters Mr. Rumsfield his Strong similarity Strong dissimilarity The secretary he Saddam Hussein Shuchi Chawla, Carnegie Mellon University

  24. A good clustering Inconsistencies or “mistakes” Consistent clustering: edges inside clusters edges between clusters Mr. Rumsfield his Strong similarity Strong dissimilarity The secretary he Saddam Hussein Shuchi Chawla, Carnegie Mellon University

  25. A good clustering No consistent clustering! Goal: Find the most consistent clustering Mistakes Mr. Rumsfield his Strong similarity Strong dissimilarity The secretary he Saddam Hussein Shuchi Chawla, Carnegie Mellon University

  26. Correlation Clustering • Given a graph with positive(similar) and negative(dissimilar) edges, find the most consistent clustering • NP-hard [Bansal, Blum, C, FOCS’02] • Two natural objectives – Maximize agreements (# of +ve inside clusters) + (# of –ve between clusters) Minimize disagreements (# of +ve between clusters) + (# of –ve inside clusters) • Equivalent at optimality, but different in terms of approximation Shuchi Chawla, Carnegie Mellon University

  27. Overview of results APX-hard [CGW 03] 29/28 116/115 [CGW 03] [CGW 03] Min Disagree Max Agree 17433[Bansal Blum C 02] PTAS [Bansal Blum C 02] Unweighted (complete) graphs 4 [Charikar Guruswami Wirth 03] O(log n) 1.3048 [CGW 03] [Immorlica Demaine 03] 1.3044 [Swamy 04] Weighted graphs [Emanuel Fiat 03] [Charikar Guruswami Wirth 03] Shuchi Chawla, Carnegie Mellon University

  28. Minimizing Disagreements [Bansal, Blum, C, FOCS’02] “Erroneous Triangle” Any clustering disagrees with at least one of these edges Dopt Maximum fractional packing of erroneous triangles • Goal: approximately minimize number of “mistakes” • Assumption: The graph is unweighted and complete • A lower bound on OPT : Erroneous Triangles Consider - + + If several edge-disjoint erroneous ∆s, then any clustering makes a mistake on each one Shuchi Chawla, Carnegie Mellon University

  29. Using the lower bound: -clean clusters “good” vertex “bad” vertex • Relating erroneous triangles to mistakes • In special cases, we can “charge-off” disagreements to erroneous triangles • “clean” clusters • each vertex has few disagreements incident on it • few is relative to the size of the cluster • # of disagreements · ¼ # of erroneous triangles Clean cluster  All vertices are good Shuchi Chawla, Carnegie Mellon University

  30. Using the lower bound: -clean clusters • Relating erroneous triangles to mistakes • In special cases, we can “charge-off” disagreements to erroneous triangles • -clean clusters • each vertex in cluster C has fewer than |C| positive and |C| negative mistakes • # of disagreements · ¼ # of erroneous triangles • A high density of positive edges We can easily spot them in the graph • Possible solution: Find a -clean clustering, and charge disagreements to erroneous triangles • Caveat: It may not exist Shuchi Chawla, Carnegie Mellon University

  31. Using the lower bound: -clean clusters • We show:  an almost--clean clustering that is almost as good as OPT Nice structure helps us find it easily. • Caveat: A d-clean clustering may not exist • An almost--clean clustering: All clusters are either -clean or contain a single node • An almost -clean clustering always exists – trivially OPT() Shuchi Chawla, Carnegie Mellon University

  32. OPT() – clean or singleton “bad” vertices Imaginary Procedure Optimal Clustering OPT() : All clusters are -clean or singleton Few new mistakes Shuchi Chawla, Carnegie Mellon University

  33. Finding clean clusters Charging-off mistakes 1. Mistakes among clean clusters - charge to erron. ∆s 2. Mistakes among singletons - no more than corresponding mistakes in OPT() ALG Clean clusters OPT() Shuchi Chawla, Carnegie Mellon University

  34. A summary of results APX-hard [CGW 03] 29/28 116/115 [CGW 03] [CGW 03] Min Disagree Max Agree 17433[Bansal Blum C 02] PTAS [Bansal Blum C 02] Unweighted (complete) graphs 4 [Charikar Guruswami Wirth 03] O(log n) 1.3048 [CGW 03] [Immorlica Demaine 03] 1.3044 [Swamy 04] Weighted graphs [Emanuel Fiat 03] [Charikar Guruswami Wirth 03] Shuchi Chawla, Carnegie Mellon University

  35. Future Directions • Better combinatorial approximation • The current best algorithms have a large running time -- employ an LP with O(n2) variables • Improving the lower bound: • Erroneous cycles – one negative edge and remaining positive The gap of this lower bound is between 2 and 4 [Charikar Guruswami Wirth 03] • Can we obtain a 2-approximation? • A good “iterative” approximation • on few changes to the graph, quickly recompute a good clustering Shuchi Chawla, Carnegie Mellon University

  36. Future Directions • Clustering with small clusters • Given that all clusters in OPT have size at most k, find a good approximation • Is this NP-hard? • Different from finding best clustering with small clusters, without guarantee on OPT • Clustering with few clusters • Given that OPT has at most k clusters, find an approximation • Maximizing Correlation • number of agreements – number of disagreements • Can we get a constant factor approximation? Shuchi Chawla, Carnegie Mellon University

  37. Timeline • Plan to finish in a year Shuchi Chawla, Carnegie Mellon University

  38. Questions?

More Related