1 / 36

Learning Submodular Functions

Learning Submodular Functions. Maria Florina Balcan. LGO, 11/16/2010. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A. Submodular functions. V={1,2, …, n}; set-function f : 2 V ! R.

nenet
Download Presentation

Learning Submodular Functions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Submodular Functions Maria FlorinaBalcan LGO, 11/16/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAA

  2. Submodular functions V={1,2, …, n}; set-function f : 2V! R • Concave Functions Let h : R!R be concave. For each S µ V, let f(S) = h(|S|) f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,TµV • Decreasing marginal return f(T [ {x})-f(T)¸ f(S [ {x})-f(S), 8 S,TµV, Tµ S, x not in T Examples: • Vector Spaces Let V={v1,,vn}, each vi2Fn.For each S µ V, let f(S) = rank(V[S])

  3. S[T SÅT Submodular set functions • Set function f on V is called submodular if For all S,T µ V: f(S)+f(T) ¸ f(S[T)+f(SÅT) • Equivalent diminishing returns characterization: + ¸ + T S + x S Large improvement Submodularity: T + x Small improvement For TµS, xS, f(T [{x}) – f(T) ¸f(S [{x}) – f(S)

  4. Example: Set cover Want to cover floorplan with discs Place sensorsin building Possiblelocations V For S µ V: f(S) = “area (# locations) covered by sensors placed at S” Node predicts values of positions with some radius Formally: W finite set, collection of n subsets Wiµ W For S µ V={1,…,n} define f(S) = |i2S Wi|

  5. x x Set cover is submodular T={W1,W2} W1 W2 f(T[{x})-f(T) ¸ f(S[{x})-f(S) W1 W2 W3 W4 S = {W1,W2,W3,W4}

  6. Submodular functions V={1,2, …, n}; set-function f : 2V! R • Concave Functions Let h : R!R be concave. For each S µ V, let f(S) = h(|S|) f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,TµV • Decreasing marginal return f(T [ {x})-f(T)· f(S [ {x})-f(S), 8 S,TµV, S µ T, x not in T Examples: • Vector Spaces Let V={v1,,vn}, each vi2Fn.For each S µ V, let f(S) = rank(V[S])

  7. Submodular functions V={1,2, …, n}; set-function f : 2V! R f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,TµV Monotone: f(S) · f(T) , 8 S µ T Non-negative: f(S) ¸ 0, 8 S µ V

  8. Submodular functions • A lot of work on optimization and submodularity. • Can be minimized in polynomial time. • Algorithmic game theory • decreasing marginal utilities. • Substantial interest in the ML community recently. • Tutorials, workshops at ICML, NIPS, etc. • www.submodularity.org/ owned by ML.

  9. Learnability of Submodular Fns • Important to also understand their learnability. • Exact learning with value queries • Previous Work: Goemans, Harvey, Iwata, Mirrokni, SODA 2009 • [GHIM’09] Model • There is an unknown submodular target function. • Algorithm allowed to (adaptively) pick sets and query the value of the target on those sets. • Can we learn the target with a polynomial number of queries in poly time? • Output a function that approximates the target within a factor of ® on every single subset.

  10. Exact learning with value queries Goemans, Harvey, Iwata, Mirrokni, SODA 2009 • Theorem: (General upperbound) 9 an alg. for learning a submodular function with an approx. factor O(n1/2). • Theorem: (General lower bound) • Any alg. for learning a submodular must have an approx. factor of (n1/2).

  11. Problems with the GHIM model • - Lower bound fails if our goal is to do well on most of the points. • - Many simple functions that are easy to learn in the PAC model (e.g., conjunctions) are impossible to get exactly from a poly number of queries • - Well known that value queries are undesirable in some learning applications. Is there a better model that gets around these problems?

  12. Problems with the GHIM model • - Lower bound fails if our goal is to do well on most of the points. • - Many simple functions that are easy to learn in the PAC model (e.g., conjunctions) are impossible to get exactly from a poly number of queries • - Well known that value queries are undesirable in some learning applications. Learning submodularfns in a distributional learning setting [BH10]

  13. Our model: Passive Supervised Learning Data Source Distribution D on {0,1}n Expert / Oracle Learning Algorithm Labeled Examples (x1,f(x1)),…, (xk,f(xk)) f : {0,1}n R+ Alg.outputs g : {0,1}n R+

  14. Our model: Passive Supervised Learning Data Source Distribution D on {0,1}n Expert / Oracle Learning Algorithm Labeled Examples (x1,f(x1)),…, (xk,f(xk)) • Algorithm sees (x1,f(x1)),…, (xk,f(xk)),xii.i.d. from D f : {0,1}n R+ • Algorithm produces “hypothesis” g. (Hopefully g ¼f) • Prx1,…,xm[ Prx[g(x)·f(x)·®g(x)]¸1-²] ¸1-± Alg.outputs g : {0,1}n R+ • “Probably MostlyApproximatelyCorrect”

  15. Main results • Theorem: (Our general upper bound) 9 an alg. for PMAC-learning the class of non-negative, monotone, submodular fns (w.r.t. an arbitrary distribution) with an approx. factor O(n1/2). • Note: • Much simpler alg. compared to GIHM’09 • Theorem: (Our general lower bound) • No algorithm can PMAC learn the class of non-neg., monotone, submodular fns with an approx. factorõ(n1/3). • Note: • The GIHM’09 lower bound fails in our model. • Theorem: (Product distributions) Matroid rank functions, const. approx.

  16. A General Upper Bound • Theorem: 9 an alg. for PMAC-learning the class of non-negative, monotone, submodular fns (w.r.t. an arbitrary distribution) with an approx. factor O(n1/2).

  17. Subaddtive Fns are Approximately Linear • Let f be non-negative, monotone and subadditive • Claim:f can be approximated to within factor nby a linear functiong. • Proof Sketch: Let g(S) = s in S f({s}).Then f(S)·g(S) ·n ¢f(S). Subadditive: f(S)+f(T) ¸ f(S[ T) 8 S,T µ V Monotonicity: f(S) · f(T) 8 Sµ T Non-negativity: f(S) ¸ 0 8 S µ V

  18. Subaddtive Fns are Approximately Linear • f(S) ·g(S) ·n¢f(S). n¢f g f V

  19. PMAC Learning Subadditive Fns • fnon-negative, monotone,subadditiveapproximated to within factor nby a linear functiong, • g (S) =w¢Â (S). • Sample S from D; flip a coin. • Labeled examples((Â(S), f(S) ), +) and ((Â(S), n¢f(S) ), -) are linearly separable inRn+1. • Idea: learn a linear separator. Use std sample complex. • Problem: data noti.i.d. • Solution: create a related distribution. • If heads add ((Â(S), f(S) ), +). • Else add ((Â(S), n¢f(S) ), -).

  20. PMAC Learning Subadditive Fns • Algorithm: • Note: • Deal with the set {S:f(S)=0 } separately. Input: (S1, f(S1)) …, (Sm, f(Sm)) • For each Si, flip a coin. • If heads add ((Â(S), f(Si) ), +). • Else add ((Â(S), n¢f(Si) ), -). • Learn a linear separator u=(w,-z) in Rn+1. • Output:g(S)=1/(n+1) w ¢Â (S). • Theorem: For m = £(n/²), g approximates f to within a factor n on a 1-² fraction of the distribution.

  21. PMAC Learning Submodular Fns • Algorithm: • Note: • Deal with the set {S:f(S)=0 } separately. Input: (S1, f(S1)) …, (Sm, f(Sm)) • For each Si, flip a coin. • If heads add ((Â(S), f2(S_i)) ), +). • Else add ((Â(S), n f2(S_i) ), -). • Learn a linear separator u=(w,-z) in Rn+1. • Output:g(S)=1/(n+1)1/2 w ¢Â (S) • Theorem: For m = £(n/²), g approximates f to within a factor \sqrt{n} on a 1-² fraction of the distribution. Proof idea: f non-negative, monotone, submodular approximated to within factor \sqrt{n} by a \sqrt{linear function}. [GHIM, 09]

  22. PMAC Learning Submodular Fns • Algorithm: • Note: • Deal with the set {S:f(S)=0 } separately. Input: (S1, f(S1)) …, (Sm, f(Sm)) • For each Si, flip a coin. • If heads add ((Â(S), f2(S_i)) ), +). • Else add ((Â(S), n f2(S_i) ), -). • Learn a linear separator u=(w,-z) in Rn+1. • Output:g(S)=1/(n+1)1/2 w ¢Â (S) • Much simpler than [GIHM09]. More robust to variations. • the target only needs to be within an ¯ factor of a submodularfnc. • 9 a submodularfnc that agrees with target on all but a ´ fraction of the points (on the points it disagrees it can be arbitrarily far). • [the alg is inefficient in this case]

  23. A General Lower Bound • Theorem: (Our general lower bound) • No algorithm can PMAC learn the class of non-neg., monotone, submodular fns with an approx. factorõ(n1/3). Plan: Use the fact that any matroid rank fnc is submodular. Construct a hard family of matroid rank functions. High=n1/3 X X L=nlog log n X X Low=log2n A1 AL A2 A3 … … …. ….

  24. Partition Matroids A1, A2, …, Akµ V={1,2, …, n}, all disjoint;ui· |Ai|-1 • E.g., n=5, A1={1,2,3}, A2={3,4,5}, u1=u2=2. Ind={I: |I ÅAj| ·uj, for all j } Then (V, Ind) is a matroid. If sets Ai are not disjoint, then (V,Ind) might not be a matroid. • {1,2,4,5} and {2,3,4} both maximal sets in Ind; do not have the same cardinality.

  25. Almost partition matroids k=2, A1, A2µ V (not necessarily disjoint); ui· |Ai|-1 Ind={I: |I ÅAj| ·uj , |I Å (A1[A2)| ·u1 +u2 - |A1ÅA2|} Then (V,Ind) is a matroid.

  26. Almost partition matroids More generally f : 2[k]! Z A1, A2, …, Akµ V={1,2, …, n}, ui· |Ai|-1; =<0 f(J)= j 2 J uj +|A(J)|-j 2J|Aj|, 8 J µ [k] Ind= { I: |I ÅA(J)| · f(J), 8 J µ [k] } Then (V, Ind) is a matroid (if nonempty). Rewrite f, f(J)=|A(J)|-j 2 J(|Aj| - uj), 8 J µ [k]

  27. A generalization of partition matroids f : 2[k]! Z More generally f(J)=|A(J)|-j 2 J(|Aj| - uj), 8 J µ [k] Ind= { I: |I ÅA(J)| · f(J), 8 J µ [k] } Then (V, Ind) is a matroid (if nonempty). Uncrossing argument Proof technique: For a set I, define T(I) to be the set of tight constraints T(I)= {J µ [k], |I ÅA(J)|=f(J)} 8 I 2Ind, J1, J22 T(I), then (J1 [J22 T(I)) or (J1Å J2 =) Ind is the family of independent sets of a matroid.

  28. A generalization of almost partition matroids f : 2[k]! Z, f(J)=|A(J)|-j 2 J(|Aj| -uj), 8 J µ [k]; ui· |Ai|-1 Note: This requires k· n (for k > n, f becomes negative) But we want k=n^{log log n}. Do some sort of truncation to allow k>>n. f(J) is (¹, ¿) good if f(J) ¸ 0 for J µ [k], |J| ·¿ and f(J) ¸¹ for J µ [k], ¿·|J| · 2¿ -2 h(J)=f(J) if |J| ·¿ and h(J)=¹, otherwise. Ind= { I: |I ÅA(J)| · h(J), 8 J µ [k] } Then (V,Ind) is a matroid (if nonempty).

  29. A generalization of partition matroids Let L = nlog log n. Let A1, A2, …, AL be random subsets of V. (Ai -- include each elem of V indep with prob n-2/3. ) Let ¹=n^{1/3} log2 n, u=log2 n, ¿=n1/3 Each subset J µ {1,2, …, L} induces a matroids.t. for any i not in J, Ai is indep in this matroid • Rank(Ai), i not in J, is roughly |Ai| (i.e., £(n^{1/3})), • The rank of sets Aj, j in J is u=log2 n. High=n1/3 X X L=nlog log n X X Low=log2n A1 AL A2 A3 … … …. ….

  30. Product distributions, Matroid Rank Fns Talagrand implies: • Let D be a product distribution on V, R=rank(X), X drawn from D. If E[R] ¸ 4000, • [Chekuri, Vondrak ’09] and [Vondrak ’10] prove a slightly more general result by two different techniques • If E[R]· 500 log(1/²), Related Work:

  31. Product distributions, Matroid Rank Fns Talagrand implies: • Let D be a product distribution on V, R=rank(X), X drawn from D. If E[R] ¸ 4000, • Let ¹= i=1m f (xi) / m • Let g be the constant function with value ¹ • If E[R]· 500 log(1/²), • Algorithm: • This achieves approximation factor O(log(1/²)) on a 1-² fraction of points, with high probability.

  32. Conclusions and Open Questions • Analyze intrinsic learnability of submodular fns • Our analysis reveals interesting novel extremal and structural properties of submodular fns. • Open questions • Improve (n1/3) lower bound to (n1/2) • Non-monotone submodular functions

  33. Other interesting structural properties • Let h : R!R+be concave, non-decreasing. For each Sµ V, let f(S) = h(|S|) • Claim: These functions f are submodular, monotone, non-negative. V ;

  34. Theorem:Every submodular function looks like this. Lots of approximately usually. V ;

  35. Theorem:Every submodular function looks like this. Lots of approximately usually. Let f be a non-negative, monotone, submodular, 1-Lipschitz function.For any ²>0, there exists a concave function h : [0,n] !Rs.t.for every k2[0,n], and for a 1-² fraction of SµV with |S|=k,we have: V ; • Theorem h(k) ·f(S) · O(log2(1/²))¢h(k). In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. Proof: Based on Talagrand’s Inequality.

  36. Conclusions and Open Questions • Analyze intrinsic learnability of submodular fns • Our analysis reveals interesting novel extremal and structural properties of submodular fns. • Open questions • Improve (n1/3) lower bound to (n1/2) • Non-monotone submodular functions • Any algorithm? • Lower bound better than (n1/3)

More Related