380 likes | 528 Views
Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal). Shuchi Chawla Carnegie Mellon University. Two classes of Graph Optimization problems. Optimization problems on graphs arise in many fields Typically NP-hard
E N D
Approximation algorithms forPath-Planning and Clusteringproblems on graphs(Thesis Proposal) Shuchi Chawla Carnegie Mellon University
Two classes of Graph Optimization problems • Optimization problems on graphs arise in many fields • Typically NP-hard • We consider two classes of problems motivated by machine learning and AI: • Path-planning – Construct a “good” path, given a map • Clustering – Divide objects into groups based on similarity Shuchi Chawla, Carnegie Mellon University
A Robot Navigation Problem • Task: Deliver packages to certain locations • Faster delivery => greater happiness; “reward” • Want a path with short length and large reward • Classic formulation – Traveling Salesman Find the shortest tour covering all locations • Some complicating constraints • Limited battery power – robot may die before finishing task • Packages have different deadlines for delivery • Preference to the larger reward packages • An alternate formulation – Orienteering • Construct a path of length · D • Visit as many locations (reward) as possible Shuchi Chawla, Carnegie Mellon University
Path-planning in the real-world: Motivation • Given graph (metric) G, construct a path satisfying some constraints and optimizing some function. • Some applications: Robotics Assembly analysis Manufacturing Production planning • A trade-off between time and reward maximize reward with bounded length minimize length with reward quota some combination of both Shuchi Chawla, Carnegie Mellon University
A time-reward trade-off • Impose a reward quota and minimize length • Metric TSP Collect all points • k-Path Collect at least k reward • Budget the path-length and maximize reward • Orienteering Hard bound on path length • Time Window Visit node v within [Rv, Dv] • Optimize a combination of reward and length • Prize Collecting TSP Min (length + reward left) • Discounted Reward TSP max reward; reward decreases with time Shuchi Chawla, Carnegie Mellon University
A time-reward trade-off • Impose a reward quota and minimize length • Metric TSP 1.5 [Christofides 76] • k-Path 2 + [Chaudhury Godfrey Rao+ 03] • Budget the path-length and maximize reward • Orienteering3 • Time Window3log2n[Bansal Blum C Meyerson 04] • Optimize a combination of reward and length • Prize Collecting TSP 2 [Goemans Williamson 95] • Discounted Reward TSP6.75 + [Blum C Karger+ 03] [Blum C Karger Meyerson Minkoff Lane 03] Shuchi Chawla, Carnegie Mellon University
Orienteering and k-Path • Orienteering : length · D; maximize reward • k-Path : reward ¸ k ; minimize length • Complementary problems • Series of results on k-TSP (related to k-Path) [BRV99] [Garg99] [AK00] [CGRT03] … best approx: (2+) • None for Orienteering until recently! Shuchi Chawla, Carnegie Mellon University
Why is Orienteering difficult? OPT(d) s APPROX • First attempt – Use distance-based approximations to approximate reward • Let OPT(d) = max achievable reward with length d • A 2-approx for distance implies that ALG(d) ≥ OPT(d/2) • However, we may have OPT(d/2) << OPT(d) • Bad trade-off between distance and reward! Shuchi Chawla, Carnegie Mellon University
Why is Orienteering difficult? Min-Excess Path Problem t s APPROX • Second attempt – approximate subparts of the optimal path and shortcut other parts • If we stray away from the optimal path by a lot, we may not be able to cover reward that’s far away • Approximate the “extra” length taken by a path over the shortest path length OPT Shuchi Chawla, Carnegie Mellon University
The Min-Excess Problem excess • Given graph G, start and end nodes s, t, reward on nodes v,target reward k, find a path that collects reward at least k and minimizes(P) =ℓ(P) – d(s,t) • At optimality, this is exactly the same as the k-path objective of minimizing ℓ(P) • However, approximation is different: Min-excess is strictly harder than K-path • We give a (2+)-approximation for Min-Excess [Blum, C, Karger, Meyerson, Minkoff, Lane, FOCS’03] • Our algorithm returns a path with length d(s,t) + (2+) (P) Shuchi Chawla, Carnegie Mellon University
A 3-approximation to Orienteering • Using an r-approx for Min-excess ( r Z+ ), we get an r-approximation for s-t Orienteering Open: Given an r-approx for min-excess (r 2R +), can we get r-approx to Orienteering? v2 t 3 s 1 2 v1 APPROX Excess of one path · (1+2+3)/3 Can afford an excess up to (1+2+3) [Blum C Karger + 03] • There exists a path from s to t, that • collects reward at least • has length D • Given a 3-approximation to min-excess: 1. Divide into 3 “equal-reward” parts (hypothetically) 2. Approximate the part with the smallest excess 3-approximation to orienteering Excess of path P (P) = dP(u,v)– d(u,v) OPT Shuchi Chawla, Carnegie Mellon University
The next step: Deadline-TSP [Bansal Blum C Meyerson 04] • Every vertex has a deadline D(v); Find a path that maximizes nodes v visited before D(v) • Arises in scheduling, production planning • If the last node on the path has the min deadline, use Orienteering to approximate the reward • Don’t need to bother about deadlines of other nodes • Does OPT always have a large subpath with the above property? • There are many subpaths of OPT with the above property that together contain all the reward NO! Shuchi Chawla, Carnegie Mellon University
A segmentation of OPT Deadline Time Shuchi Chawla, Carnegie Mellon University
Deadline-TSP • Segment graph into many parts, approximate each using Orienteering and patch them together • How do we find such a segmentation without knowing the optimal path? • In order to avoid double-counting of reward, segments should be node-disjoint • Our result – There exists a segmentation based only on deadlines, such that the resulting solution is a (3 log n)-approximation Open: Is there a segmentation based on other properties (eg. distance from the root), that gives a constant approximation? Shuchi Chawla, Carnegie Mellon University
An overview of our results Problem Approximation References Min-Excess 2+ [FOCS 03] Orienteering 3 [STOC 04] Discounted-Reward TSP 6.75+ [FOCS 03] Deadline TSP 3 logn [STOC 04] Time-Window Problem 3 log2n [STOC 04] reward: log 1/ deadlines: 1+ Time-Window Problem - bicriteria [STOC 04] Shuchi Chawla, Carnegie Mellon University
Future Directions • Better approximations • can we get a constant factor for Time-Windows? • special metrics such as trees or planar graphs • hardness of approximation? • Asymmetric Path-planning • the graph is directed; still obeys triangle inequality • polylog-approximations and lower bounds for distance • need entirely different ideas for asymmetric-Orienteering • is it log-hard? • Group Path-planning • Reward is associated with “groups” of nodes • visit at least one node in a group to obtain reward Shuchi Chawla, Carnegie Mellon University
Future Directions • Stochastic Path-planning • Closer home to Robot Navigation; The graph is a Markov Decision Process • Each edge is an “action” associated with a probability distribution • The goal: • Give a “strategy” to accomplish a given task as fast as possible • Best action could be history dependent • Can we write down the best strategy in polynomial time? • Approximate it in poly-time or even in NP? 0.7 0.2 0.3 0.5 0.1 0.2 Shuchi Chawla, Carnegie Mellon University
Correlation Clustering Coming up next :
Natural Language Processing • In order to understand the article automatically, need to figure out which entities are one and the same • Is “his” in the second line the same person as “The secretary” in the first line? Shuchi Chawla, Carnegie Mellon University
Real-World Clustering Problems • A wide variety of clustering problems • Co-reference Analysis • Web document clustering • Co-authorship (Citeseer/DBLP) • Computer Vision • Typical characteristics: • No well-defined “similarity metric” • Number of clusters is unknown • No predefined topics – desirable to figure them out as part of the algorithm Shuchi Chawla, Carnegie Mellon University
Cohen, McCallum & Richman’s idea Mr. Rumsfield his Strong similarity Strong dissimilarity The secretary he Saddam Hussein “Learn” a similarity measure based on context Shuchi Chawla, Carnegie Mellon University
A good clustering Consistent clustering: edges inside clusters edges between clusters Mr. Rumsfield his Strong similarity Strong dissimilarity The secretary he Saddam Hussein Shuchi Chawla, Carnegie Mellon University
A good clustering Inconsistencies or “mistakes” Consistent clustering: edges inside clusters edges between clusters Mr. Rumsfield his Strong similarity Strong dissimilarity The secretary he Saddam Hussein Shuchi Chawla, Carnegie Mellon University
A good clustering No consistent clustering! Goal: Find the most consistent clustering Mistakes Mr. Rumsfield his Strong similarity Strong dissimilarity The secretary he Saddam Hussein Shuchi Chawla, Carnegie Mellon University
Correlation Clustering • Given a graph with positive(similar) and negative(dissimilar) edges, find the most consistent clustering • NP-hard [Bansal, Blum, C, FOCS’02] • Two natural objectives – Maximize agreements (# of +ve inside clusters) + (# of –ve between clusters) Minimize disagreements (# of +ve between clusters) + (# of –ve inside clusters) • Equivalent at optimality, but different in terms of approximation Shuchi Chawla, Carnegie Mellon University
Overview of results APX-hard [CGW 03] 29/28 116/115 [CGW 03] [CGW 03] Min Disagree Max Agree 17433[Bansal Blum C 02] PTAS [Bansal Blum C 02] Unweighted (complete) graphs 4 [Charikar Guruswami Wirth 03] O(log n) 1.3048 [CGW 03] [Immorlica Demaine 03] 1.3044 [Swamy 04] Weighted graphs [Emanuel Fiat 03] [Charikar Guruswami Wirth 03] Shuchi Chawla, Carnegie Mellon University
Minimizing Disagreements [Bansal, Blum, C, FOCS’02] “Erroneous Triangle” Any clustering disagrees with at least one of these edges Dopt Maximum fractional packing of erroneous triangles • Goal: approximately minimize number of “mistakes” • Assumption: The graph is unweighted and complete • A lower bound on OPT : Erroneous Triangles Consider - + + If several edge-disjoint erroneous ∆s, then any clustering makes a mistake on each one Shuchi Chawla, Carnegie Mellon University
Using the lower bound: -clean clusters “good” vertex “bad” vertex • Relating erroneous triangles to mistakes • In special cases, we can “charge-off” disagreements to erroneous triangles • “clean” clusters • each vertex has few disagreements incident on it • few is relative to the size of the cluster • # of disagreements · ¼ # of erroneous triangles Clean cluster All vertices are good Shuchi Chawla, Carnegie Mellon University
Using the lower bound: -clean clusters • Relating erroneous triangles to mistakes • In special cases, we can “charge-off” disagreements to erroneous triangles • -clean clusters • each vertex in cluster C has fewer than |C| positive and |C| negative mistakes • # of disagreements · ¼ # of erroneous triangles • A high density of positive edges We can easily spot them in the graph • Possible solution: Find a -clean clustering, and charge disagreements to erroneous triangles • Caveat: It may not exist Shuchi Chawla, Carnegie Mellon University
Using the lower bound: -clean clusters • We show: an almost--clean clustering that is almost as good as OPT Nice structure helps us find it easily. • Caveat: A d-clean clustering may not exist • An almost--clean clustering: All clusters are either -clean or contain a single node • An almost -clean clustering always exists – trivially OPT() Shuchi Chawla, Carnegie Mellon University
OPT() – clean or singleton “bad” vertices Imaginary Procedure Optimal Clustering OPT() : All clusters are -clean or singleton Few new mistakes Shuchi Chawla, Carnegie Mellon University
Finding clean clusters Charging-off mistakes 1. Mistakes among clean clusters - charge to erron. ∆s 2. Mistakes among singletons - no more than corresponding mistakes in OPT() ALG Clean clusters OPT() Shuchi Chawla, Carnegie Mellon University
A summary of results APX-hard [CGW 03] 29/28 116/115 [CGW 03] [CGW 03] Min Disagree Max Agree 17433[Bansal Blum C 02] PTAS [Bansal Blum C 02] Unweighted (complete) graphs 4 [Charikar Guruswami Wirth 03] O(log n) 1.3048 [CGW 03] [Immorlica Demaine 03] 1.3044 [Swamy 04] Weighted graphs [Emanuel Fiat 03] [Charikar Guruswami Wirth 03] Shuchi Chawla, Carnegie Mellon University
Future Directions • Better combinatorial approximation • The current best algorithms have a large running time -- employ an LP with O(n2) variables • Improving the lower bound: • Erroneous cycles – one negative edge and remaining positive The gap of this lower bound is between 2 and 4 [Charikar Guruswami Wirth 03] • Can we obtain a 2-approximation? • A good “iterative” approximation • on few changes to the graph, quickly recompute a good clustering Shuchi Chawla, Carnegie Mellon University
Future Directions • Clustering with small clusters • Given that all clusters in OPT have size at most k, find a good approximation • Is this NP-hard? • Different from finding best clustering with small clusters, without guarantee on OPT • Clustering with few clusters • Given that OPT has at most k clusters, find an approximation • Maximizing Correlation • number of agreements – number of disagreements • Can we get a constant factor approximation? Shuchi Chawla, Carnegie Mellon University
Timeline • Plan to finish in a year Shuchi Chawla, Carnegie Mellon University