1 / 37

Submodular Functions Learnability , Structure & Optimization

Submodular Functions Learnability , Structure & Optimization. Nick Harvey, UBC CS Maria- Florina Balcan , Georgia Tech. Who studies submodular functions?. CS, Approximation Algorithms. Machine Learning. OR, Optimization. AGT, Economics. Valuation Functions.

kagami
Download Presentation

Submodular Functions Learnability , Structure & Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Submodular FunctionsLearnability, Structure & Optimization Nick Harvey, UBC CS Maria-FlorinaBalcan, Georgia Tech

  2. Who studies submodular functions? CS,ApproximationAlgorithms Machine Learning OR,Optimization AGT,Economics

  3. Valuation Functions A first step in economic modeling: • individuals have valuation functions givingutility for different outcomes or events. f( ) !R

  4. Valuation Functions A first step in economic modeling: • individuals have valuation functions givingutility for different outcomes or events. Focus on combinatorial settings: • n items,{1,2,…,n} = [n] f( ) !R • f : 2[n]!R.

  5. Learning Valuation Functions This talk: learningvaluation functions from past data. • Bundle pricing • Package travel deals

  6. S[ T SÅ T Submodular valuations • [n]={1,…,n}; Function f : 2[n]! R submodularif For all S,T µ [n]: f(S)+f(T) ¸ f(S [ T) + f(S Å T) + ¸ + T S • Equivalent to decreasing marginal return: For TµS, xS, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S) + S x Large improvement T + x Small improvement

  7. Submodular valuations • E.g., • Vector Spaces Let V={v1,,vn}, each vi2Fn.For each S µ [n], let f(S) = rank({ vi : i2 S}) • Concave Functions Let h : R!R be concave. For each S µ [n], let f(S) = h(|S|) • Decreasing marginal return: For TµS, xS, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S) + S x Large improvement T + x Small improvement

  8. Passive Supervised Learning Distribution D on 2[n] Data Source Expert / Oracle Learning Algorithm S1,…, Sk Labeled Examples (S1,f(S1)),…, (Sk,f(Sk)) f : 2[n]!R+ Alg. outputs g : 2[n]!R+

  9. PMAC model for learning real valued functions Boolean PAC Data Source S1,…, Sk Distribution D on 2[n] Expert / Oracle Learning Algorithm {0,1} {0,1} Labeled Examples (S1,f(S1)),…, (Sk,f(Sk)) • Alg. sees (S1,f(S1)),…, (Sk,f(Sk)), Sii.i.d. from D, produces g f : 2[n]!R+ Alg.outputs • With probability ¸ 1-±,we have PrS[g(S)·f(S)·®g(S) ]¸1-² g : 2[n]!R+ • Probably Mostly ApproximatelyCorrect

  10. Learning submodular functions • Theorem: (Our general upper bound) Monotone, submodular functions can be PMAC-learned(w.r.t. an arbitrary distribution) with approximation factor ®=O(n1/2). • Theorem: (Our general lower bound) • Monotone, submodular functions cannot be PMAC-learnedwith approximation factor õ(n1/3). • Corollary: Gross substitutes functions do not haveaconcise, approximate representation. • Theorem: (Product distributions) • Lipschitz, monotonesubmodularfuntions can be PMAC-learnedunder a product distribution with approximation factor O(1).

  11. Learning submodular functions • Theorem: (Our general upper bound) Monotone, submodular functions can be PMAC-learned(w.r.t. an arbitrary distribution) with approximation factor ®=O(n1/2). • Theorem: (Our general lower bound) • Monotone, submodular functions cannot be PMAC-learnedwith approximation factor õ(n1/3). • Corollary: Gross substitutes functions do not haveaconcise, approximate representation. • Theorem: (Product distributions) • Lipschitz, monotonesubmodularfuntions can be PMAC-learnedunder a product distribution with approximation factor O(1).

  12. Computing Linear Separators + – + – + + – – + + – – – • Given {+,–}-labeled points in Rn, find a hyperplanecTx = b that separates the +s and –s. • Easily solved by linear programming. + – – +

  13. Learning Linear Separators + – + – + Error! + – – + + – – – + – – + • Given random sampleof {+,–}-labeled points in Rn, find a hyperplanecTx = b that separates most ofthe +s and –s. • Classic machine learning problem.

  14. Learning Linear Separators + – + – + Error! + – – + + – – – + – – + • Classic Theorem: [Vapnik-Chervonenkis 1971?]O( n/²2 ) samples suffice to get error ². ~

  15. Submodular Functions are Approximately Linear • Let f be non-negative, monotone and submodular • Claim:f can be approximated to within factor nby a linear functiong. • Proof Sketch: Let g(S) = §s2Sf({s}).Then f(S) ·g(S) ·n¢f(S). Submodularity: f(S)+f(T)¸f(SÅT)+f(S[T) 8S,TµV Monotonicity: f(S)·f(T) 8SµT Non-negativity: f(S)¸0 8SµV

  16. Submodular Functions are Approximately Linear n¢f g f V

  17. n¢f g – • Randomly sample {S1,…,Sk} from distribution • Create + for f(Si) and – for n¢f(Si) • Now just learn a linear separator! – + + + – – f + + – + V – + –

  18. n¢f g • Theorem:g approximates f to within a factor n on a 1-² fraction of the distribution. f V

  19. n¢f2 g • Can improve to O(n1/2): in fact f2 and n¢f2 are separatedby a linear function [Goemans et al. ‘09] • John’s Ellipsoid theorem: any centrally symmetric convex body is approximated by an ellipsoid to within factor n1/2 f2 V

  20. Learning submodular functions • Theorem: (Our general upper bound) Monotone, submodular functions can be PMAC-learned(w.r.t. an arbitrary distribution) with approximation factor ®=O(n1/2). • Theorem: (Our general lower bound) • Monotone, submodular functions cannot be PMAC-learnedwith approximation factor õ(n1/3). • Corollary: Gross substitutes functions do not haveaconcise, approximate representation. • Theorem: (Product distributions) • Lipschitz, monotonesubmodularfuntions can be PMAC-learnedunder a product distribution with approximation factor O(1).

  21. V ; f(S) = min{ |S|, k } |S| (if |S| · k) f(S) = k (otherwise)

  22. A V ; |S| (if |S| · k) f(S) = k-1 (if S=A) k (otherwise)

  23. A1 A2 A3 Ak V ; A = {A1,,Am}, |Ai|=k |S| (if |S| · k) f(S) = k-1 (if S 2A) Claim: f is submodular if |AiÅAj|·k-2 8ij k (otherwise)

  24. A1 If algorithm seesonly these examples A2 A3 Then f can’t bepredicted here Ak V ; Delete half of the bumps at random. Then f is very unconcentrated on A ) any algorithm to learn f has additive error 1 |S| (if |S| · k) f(S) = k-1 (if S 2A and wasn’t deleted) k (otherwise)

  25. A1 A2 A3 Ak V ; Can we force a bigger error with bigger bumps? Yes, if Ai’s are very “far apart”. This can be achieved by picking them randomly.

  26. Theorem: (Main lower bound construction) • There is a distribution D and a randomly chosen function f s.t. • f is monotone, submodular • Knowing the value of f on poly(n) random samples from D does not suffice to predict the value of f on future samples from D, even to within a factor o(n1/3). Plan: • Choose two values High=n1/3 and Low=O(log2 n). • Choose random sets A1,…,Amµ[n],with |Ai|=High and m = nlog n. • D is the uniform distribution on {A1,…,Am}. • Create a function f : 2[n]!R.For each i, randomly set f(Ai)=High or f(Ai)=Low. • Extend f to a monotone, submodular function on 2[n]. ~

  27. Creating the function f • We choose f to be a matroid rank function • Such functions have a rich combinatorial structure, and are always submodular • The randomly chosen Ai’s form an expander: • The expansion property can be leveraged to ensure f(Ai)=High or f(Ai)=Low as desired. where H = { j : f(Aj) = High }

  28. Learning submodular functions • Theorem: (Our general upper bound) Monotone, submodular functions can be PMAC-learned(w.r.t. an arbitrary distribution) with approximation factor ®=O(n1/2). • Theorem: (Our general lower bound) • Monotone, submodular functions cannot be PMAC-learnedwith approximation factor õ(n1/3). • Corollary: Gross substitutes functions do not haveaconcise, approximate representation. • Theorem: (Product distributions) • Lipschitz, monotonesubmodularfuntions can be PMAC-learnedunder a product distribution with approximation factor O(1).

  29. Gross Substitutes Functions • Class of utility functions commonly used in mechanism design [Kelso-Crawford ‘82, Gul-Stacchetti ‘99, Milgrom ‘00, …] • Intuitively, increasingthe prices for someitemsdoes not decrease demand for the other items. • Question:[Blumrosen-Nisan, Bing-Lehman-Milgrom]Do GS functions have a concise representation?

  30. Gross Substitutes Functions • Class of utility functions commonly used in mechanism design [Kelso, Crawford, Gul, Stacchetti, …] • Question:[Blumrosen-Nisan, Bing-Lehman-Milgrom]Do GS functions have a concise representation? • Fact: Every matroid rank function is GS. • Corollary: The answer to the question is no. • Theorem: (Main lower bound construction) • There is a distribution D and a randomly chosen function f s.t. • f is a matroid rank function • poly(n) bits of information do not suffice to predict the value of f on samples from D, even to within a factor o(n1/3). ~

  31. Learning submodular functions • Theorem: (Our general upper bound) Monotone, submodular functions can be PMAC-learned(w.r.t. an arbitrary distribution) with approximation factor ®=O(n1/2). • Theorem: (Our general lower bound) • Monotone, submodular functions cannot be PMAC-learnedwith approximation factor õ(n1/3). • Corollary: Gross substitutes functions do not haveaconcise, approximate representation. • Theorem: (Product distributions) • Lipschitz, monotonesubmodularfuntions can be PMAC-learnedunder a product distribution with approximation factor O(1).

  32. Learning submodular functions • Hypotheses: • PrX»D[ X=x ] = i Pr[Xi=xi] (“Product distribution”) • f({i})2 [0,1] for all i2 [n] (“Lipschitz function”) • f({i})2 {0,1} for all i2 [n] Stronger condition! • Theorem: (Product distributions) Lipschitz, monotonesubmodularfuntions can be PMAC-learnedunder a product distribution withapproximation factor O(1).

  33. Technical Theorem:For any ²>0, there exists a concave function h : [0,n] !Rs.t.for every k2[n], and for a 1-² fraction of SµV with |S|=k,we have: In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. h(k) ·f(S) · O(log2(1/²))¢h(k). V ;

  34. Technical Theorem:For any ²>0, there exists a concave function h : [0,n] !Rs.t.for every k2[n], and for a 1-² fraction of SµV with |S|=k,we have: In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. Algorithm: • Let ¹ = §i=1 f(xi) / m • Let g be the constant function with value ¹ This achieves approximation factor O(log2(1/²)) ona 1-² fraction of points, with high probability. h(k) ·f(S) · O(log2(1/²))¢h(k). m • Theorem: (Product distributions) Lipschitz, monotonesubmodularfuntions can be PMAC-learnedunder a product distribution withapproximation factor O(1).

  35. Technical Theorem:For any ²>0, there exists a concave function h : [0,n] !Rs.t.for every k2[n], and for a 1-² fraction of SµV with |S|=k,we have: In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. Concentration Lemma:Let X have a product distribution. For any ®2 [0,1], Proof: Based on Talagrand’s concentration inequality. h(k) ·f(S) · O(log2(1/²))¢h(k).

  36. Follow-up work • Subadditive & XOS functions [Badanidiyuru et al., Balcan et al.] • O(n1/2) approximation • (n1/2) inapproximability • Symmetric submodular functions [Balcan et al.] • O(n1/2) approximation • (n1/3) inapproximability

  37. Conclusions • Learning-theoretic view of submodular fns • Structural properties: • Very “bumpy” under arbitrary distributions • Very “smooth” under product distributions • Learnability in PMAC model: • O(n1/2) approximation algorithm • (n1/3) inapproximability • O(1) approx for Lipschitz fns & product distrs • No concise representation for gross substitutes

More Related