1 / 80

Submodularity in Machine Learning

Submodularity in Machine Learning. Andreas Krause Stefanie Jegelka. Network Inference. How learn who influences whom?. Summarizing Documents. How select representative sentences?. MAP inference. sky. tree. house. grass. How find the MAP labeling in discrete graphical models

michel
Download Presentation

Submodularity in Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Submodularityin Machine Learning Andreas Krause Stefanie Jegelka

  2. Network Inference How learn who influences whom?

  3. Summarizing Documents How select representative sentences?

  4. MAP inference sky tree house grass How find the MAP labeling in discrete graphical models efficiently?

  5. What’s common? • Formalization: Optimize a set function F(S) under constraints • generally very hard • but:structure helps! … if F issubmodular, we can … • solve optimization problems with strong guarantees • solve some learning problems

  6. Outline many new results!  • What is submodularity? • Optimization • Minimization • Maximization • Learning • Learning for Optimization: new settings Part I Break Part II

  7. Outline many new results!  • What is submodularity? • Optimization • Minimization: new algorithms, constraints • Maximization: new algorithms (unconstrained) • Learning • Learning for Optimization: new settings Part I Part II … and many new applications!

  8. submodularity.org slides, links, references, workshops, …

  9. Example: placing sensors Place sensors to monitor temperature

  10. Set functions • finite ground set • set function • will assume (w.l.o.g.) • assume black box that can evaluatefor any

  11. X1 X2 X3 Example: placing sensors Utility of having sensors at subset of all locations X4 X1 X5 A={1,4,5}: Redundant infoLow valueF(A) A={1,2,3}: Very informativeHigh valueF(A)

  12. X1 X2 Xs Marginal gain • Givensetfunction • Marginal gain: new sensor s

  13. placement B = {1,…,5} X2 X1 X2 X3 X4 X5 X1 Xs Decreasing gains: submodularity placement A = {1,2} + s + s Big gain Adding s helps a lot! Adding s doesn’t help much small gain new sensor s A B

  14. Equivalent characterizations • Diminishing gains: for all • Union-Intersection: for all A + s + s B

  15. Questions How do I provemyproblemis submodular? Whyissubmodularityuseful?

  16. Example: Set cover place sensorsin building goal: cover floorplan with discs Possiblelocations : “area covered by sensors placed at A” Node predicts values of positions with some radius Formally: Finite set , collection of n subsets For define

  17. S’ S’ Set cover is submodular A={s1,s2} S1 S2 F(A U {s’}) – F(A) ≥ F(B U {s’}) – F(B) S1 S2 S3 S4 B = {s1,s2,s3,s4}

  18. More complex model for sensing X6 X3 X4 X1 X5 X2 Likelihood Prior Y1 Y2 Y3 Ys: temperatureat location s Xs: sensor valueat location s Y6 Y4 Y5 Xs= Ys+ noise Joint probability distribution P(X1,…,Xn,Y1,…,Yn) = P(Y1,…,Yn) P(X1,…,Xn| Y1,…,Yn)

  19. X1 X2 X3 Example: Sensor placement Utility of having sensors at subset A of all locations Uncertaintyabout temperature Y after sensing Uncertaintyabout temperature Ybefore sensing X4 X1 X5 A={1,2,3}: High valueF(A) A={1,4,5}: Low valueF(A)

  20. Y1 Y2 Y3 X1 X2 X3 X4 Submodularityof Information Gain Y1,…,Ym, X1, …, Xn discrete RVs F(A) = I(Y; XA) = H(Y)-H(Y | XA) • F(A) is NOT always submodular If Xi are all conditionally independent given Y,then F(A) is submodular! [Krause & Guestrin `05] Proof:“information never hurts” F(A) is always monotone

  21. Example: costs cost: time to reach shop + price of items Market 3 breakfast?? t3 ground set t1 t2 Market 1 Market 2 each item 1 $

  22. Example: costs cost: time to shop + price of items Market 3 breakfast?? F( ) = cost() + cost( , ) = t1+ 1 + t2+ 2 = #shops + #items Market 1 Market 2 submodular?

  23. Sharedfixedcosts A B marginal cost: #new shops + #new items decreasing  cost is submodular! • shops: shared fixed cost • economies of scale

  24. Another example: Cut functions 1 2 3 a c e g V={a,b,c,d,e,f,g,h} 2 3 2 2 2 3 3 3 b d f h 2 1 3 Cut function is submodular!

  25. Whyarecutfunctions submodular? w a b 1 2 3 a c e g 2 3 2 2 2 3 3 3 Submodular if w ≥ 0! b d f h 2 1 3 Cut function in subgraph {i,j}  Submodular! Restriction: still submodular

  26. Closedness properties F1,…,Fmsubmodular functions on V and 1,…,m > 0 Then: F(A) = ii Fi(A) is submodular Submodularity closed under nonnegative linear combinations! Extremely useful fact: • F(A) submodular P() F(A) submodular! • Multicriterionoptimization • A basic proof technique! 

  27. Other closednessproperties • Restriction: F(S) submodular on V, W subsetof V Thenis submodular S V W

  28. Other closednessproperties • Restriction: F(S) submodular on V, W subsetof V Thenis submodular • Conditioning: F(S) submodular on V, W subsetofV Thenis submodular S V W

  29. Other closednessproperties • Restriction: F(S) submodular on V, W subsetof V Thenis submodular • Conditioning: F(S) submodular on V, W subsetofV Thenis submodular • Reflection: F(S) submodular on V Thenis submodular S V

  30. Submodularity … discrete convexity …. … or concavity?

  31. Convex aspects • convex extension • duality • efficient minimization But this is only half of the story…

  32. Concave aspects • submodularity: • concavity: A + s + s B F(A) “intuitively” |A|

  33. Submodularity and concavity • suppose and submodularif and only if … is concave

  34. Maximum of submodular functions • submodular. What about ? F(A) = max(F1(A),F2(A)) F1(A) F2(A) |A| max(F1,F2) not submodular in general!

  35. Minimum of submodular functions Well, maybe F(A) = min(F1(A),F2(A)) instead? F({b}) – F({})=0 < F({a,b}) – F({a})=1 min(F1,F2) not submodular in general!

  36. Twofacesof submodular functions • Convexaspects • Usefulforminimization • Convexextension • Concaveaspects • Usefulformaximization • Multilinear extension • Not closedunderminimumandmaximum Convexaspects minimization! Concaveaspects maximization!

  37. Whatto do with submodular functions Optimization Learning Minimization Online/adaptive optim. Maximization

  38. Whatto do with submodular functions Optimization Learning Minimization Online/adaptive optim. Maximization Minimizationandmaximization not thesame??

  39. Submodular minimization structured sparsityregularization clustering t s minimum cut MAP inference

  40. Submodular minimization  submodularity and convexity

  41. Set functions and energy functions any set function with . … is a function on binary vectors! 1 a 1 b 0 c 0 d A a b c d pseudo-boolean function

  42. Submodularity and convexity extension • minimum of f is a minimum of F • submodular minimization as convex minimization:polynomial time! Grötschel, Lovász, Schrijver 1981 Lovász extension convex Lovász, 1982

  43. Submodularity and convexity extension • minimum of f is a minimum of F • submodular minimizationas convex minimization:polynomial time! Lovász extension convex Lovász, 1982

  44. x{b} 2 1 x{a} -1 0 1 -2 The submodular polyhedron PF Example: V = {a,b} x({b}) ≤F({b}) PF x({a,b}) ≤F({a,b}) x({a}) ≤F({a})

  45. x{b} 2 1 -1 0 1 -2 x{a} Evaluating the Lovász extension Linear maximization over PF Exponentially many constraints!!!  Computable in O(n log n) time  [Edmonds ‘70] y* x • Subgradient • Separation oracle • greedy algorithm: • sort x • order defines sets

  46. Lovászextension: example F(a) F(b) F(a,b) F({})

  47. Submodular minimization combinatorial algorithms • Fulkerson prizeIwata, Fujishige, Fleischer ‘01 & Schrijver’00 • state of the art:O(n4T + n5logM) [Iwata ’03]O(n6 + n5T) [Orlin’09] minimize convex extension • ellipsoid algorithm[Grötschel et al. `81] • subgradient method,smoothing[Stobbe & Krause `10] • duality: minimum norm point algorithm[Fujishige & Isotani’11] T = time forevaluatingF

  48. x{b} 2 1 -2 -1 0 1 x{a} The minimum-norm-point algorithm Base polytopeBF Example: V = {a,b} dual: minimum norm problem Lovász extension regularized problem u({a,b})=F({a,b}) minimizes F: u* [-1,1] Fujishige ‘91, Fujishige & Isotani ‘11

  49. x{b} 2 1 -1 0 1 -2 x{a} The minimum-norm-point algorithm u({a,b})=F({a,b}) find u* [-1,1] can we solve this?? yes! recall: can solve linear optimization over PF similar: optimization over BF  can find (Frank-Wolfe algorithm) Fujishige ‘91, Fujishige & Isotani ‘11

  50. Empirical comparison Minimum norm point algorithm: usually orders of magnitude faster Cut functions from DIMACS Challenge combinatorial algorithms Running time (seconds) Lower is better (log-scale!) Minimum norm point algorithm 64 128 256 512 1024 Problem size (log-scale!) [Fujishige & Isotani’11]

More Related