1 / 48

Domain Adaptation with Multiple Sources

Domain Adaptation with Multiple Sources. Yishay Mansour, Tel Aviv Univ. & Google Mehryar Mohri, NYU & Google Afshin Rostami, NYU. Adaptation. Adaptation – motivation. High level: The ability to generalize from one domain to another Significance: Basic human property

Download Presentation

Domain Adaptation with Multiple Sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Domain Adaptation with Multiple Sources Yishay Mansour, Tel Aviv Univ. & Google Mehryar Mohri, NYU & Google Afshin Rostami, NYU

  2. Adaptation

  3. Adaptation – motivation • High level: • The ability to generalize from one domain to another • Significance: • Basic human property • Essential in most learning environments • Implicit in many applications.

  4. Adaptation - examples • Sentiment analysis: • Users leave reviews • products, sellers, movies, … • Goal: score reviews as positive or negative. • Adaptation example: • Learn for restaurants and airlines • Generalize to hotels

  5. Adaptation - examples • Speech recognition • Adaptation: • Learn a few accents • Generalize to new accents • think “foreign accents”.

  6. Adaptation and generalization • Machine Learning prediction: • Learn from examples drawn from distribution D • predict the label of unseen examples • drawn from the same distribution D • generalization within a distribution • Adaptation: • predict the label of unseen examples • drawn from a different distribution D’ • Generalization across distributions

  7. Adaptation – Related Work • Learn from D and test on D’ • relating the increase in error to dist(D,D’) • Ben-David et al. (2006), Blitzer et al. (2007), • Single distribution varying label quality • Cramer et al. (2005, 2006)

  8. Our Model

  9. D1 h1 L(D1,h1,f)≤ε . . . . . . . . . f target function Dk hk L(Dk,hk,f)≤ε Expected Loss distributions hypotheses Our Model - input Typical loss function: L(a,b)=|a-b| and L(D,h,f)= Ex~D[ |f(x)-h(x)| ]

  10. D1 . . . Dk basic distributions Our Model – target distribution λ1 target distribution Dλ λk

  11. Combine h1, … , hkto a hypothesis h* Low expected loss hopefully at most ε combining rules: let z: Σ zi = 1 and zi≥ 0 linear: h*(x) = Σ zi hi(x) distribution weighted: Our model – Combination Rule . . . hk h1 combining rule

  12. Combining Rules – Pros • Alternative: Build a dataset for the mixture. • Learning the mixture parameters is non-trivial • Combined data set might be huge size • Domain dependent data unavailable • Combined data might be huge • Sometimes only classifiers are given/exist • privacy • MOST IMPORTANT: FUNDAMENTAL THEORY QUESTION

  13. Our Results: • Linear Combining rule: • Seems like the first thing to try • Can be very bad • Simple settings where any linear combining rule performs badly.

  14. Our Results: • Distribution weighted combining rules: • Given the mixture parameter λ: • there is a good distribution weighted combining rule. • expected loss at most ε • For any target function f, • there is a good distribution combining rule hz • expected loss at most ε • Extension for multiple “consistent” target functions • expected loss at most 3ε • OUTCOME: This is the “right” hypothesis class

  15. Known Distribution

  16. Linear combining rules Original Loss: ε=0 !!! Any linear combining rule h has expected absolute loss ½

  17. Distribution weighted combining rule • Target distribution – a mixture: Dλ(x)=Σλi Di(x) • Set z=λ : • Claim: L(Dλ,hλ,f) ≤ ε

  18. Distribution weighted combining rule PROOF:

  19. Back to the bad example Original Loss: ε=0 !!! h+(x): x=a h+(x)=h1(x)=1 x=b h+(x)=h0(x)=0

  20. Unknown Distribution

  21. Unknown mixture distribution • Zero-sum game: • NATURE: selects a distribution Di • LEARNER: selects a z • hypothesis hz • Payoff: L(Di,hz,f) • Restating to previous result: • For any mixed action λ of NATURE • LEARNER has a pure action z= λ • such that the expected loss is at most ε

  22. Unknown mixture distribution • Consequence: • LEARNER has a mixed action (over z’s) • for any mixed action λ of NATURE • a mixture distribution Dλ • The loss is at most ε • Challenge: • show a specific hypothesis hz • pure, not mixed, action

  23. Searching for a good hypothesis • Uniformly good hypothesis hz: • for any Di we have L(Di, hz,f) ≤ ε • Assume all the hi are identical • Extremely lucky and unlikely case • If we have a good hypothesis we are done! • L(Dλ,hz,f) = Σ λi L(Di,hz,f) ≤ Σ λiε = ε • We need to show in general a good hz !

  24. Proof Outline: • Balancing the losses: • Show that some hz has identical loss on any Di • uses Brouwer Fixed Point Theorem • holds very generally • Bounding the losses: • Show this hz has low loss for some mixture • specifically Dz

  25. Brouwer Fixed Point Theorem : For any convex and compact set A and any continuous mapping φ : A→A, there exists a point x in A such that φ(x)=x A: compact and convex set φ: A→A continuous mapping

  26. A = {Σi zi = 1 and zi ≥ 0 } Balancing Losses Problem 1: Need to get φ continuous

  27. A = {Σi zi = 1 and zi≥ 0 } Balancing Losses Fixed point: z=φ(z) Problem 2: Needs that zi ≠0

  28. Bounding the losses • We can guarantee balanced losses even for linear combining rule ! For z=(½, ½) we have L(Da,hz,f)=½ L(Db,hz,f)=½

  29. Bounding Losses • Consider the previous z • from Brouwer fixed point theorem • Consider the mixture Dz • Expected loss is at most ε • Also: L(Dz,hz,f)= ΣzjL(Dj,hz,f)=γ • Conclusion: • For any mixture expected loss at most γ≤ε

  30. Solving the problems: • Redefine the distribution weighted rule: • Claim: For any distribution D, is continuous in z.

  31. Main Theorem For any target function f and any δ>0, there exists η>0 and z such that for any λ we have

  32. Balancing Losses • The set A = {Σ zi = 1 and zi≥ 0 } • The simplex • The mapping φ with parameters η and η’ • [φ(z)]i= (zi Li,z+η’/k)/ (ΣzjLj,z+η’) • where Li,z=L(Di,hz,η,f) • For some z in A we have φ(z)=z • zi = (zi Li,z+η’/k)/ (ΣzjLj,z+η’) >0 • Li,z = (ΣzjLj,z)+η’ - η’/(zi k) < (ΣzjLj,z)+ η’

  33. Bounding Losses • Consider the previous z • from Brouwer fixed point theorem • Consider the mixture Dz • Expected loss is at most ε+η • By definition ΣzjLj,z= L(Dz,hz,η,f) • Conclusion: γ=ΣzjLj,z ≤ ε+η

  34. Putting it together • There exists (z,η) such that: • Expected loss of hz,ηapproximately balanced • L(Di,hz,η,f) ≤γ+η’ • Bounding γ using Dz • γ =L(Dz,hz,η,f) ≤ε+η • For any mixture Dλ • L(Dλ,hz,η,f) ≤ε+η+ η’

  35. A more general model • So far: NATURE first fixes target function f • consistent target functions f • the expected loss w.r.t. Di is at most ε • for any of the k distributions • Function class F ={f is consistent} • New Model: • LEARNER picks a hypothesis h • NATURE picks f in F and mixture Dλ • Loss L(Dλ,h,f) • RESULT: L(Dλ,h,f)≤ 3ε.

  36. Simple Algorithms

  37. Uniform Algorithm • Hypothesis sets z=(1/k , … , 1/k): • Performance: • For any mixture, expected error ≤ kε • There exists mixture with expected error Ω(kε) • For k=2, there exists a mixture with 2ε-ε2

  38. Open Problem • Find a uniformly good hypothesis • efficiently !!! • algorithmic issues: • Search over the z’s • Multiple local minima.

  39. Empirical Results

  40. Empirical Results • Data-set of sentiment analysis: • good product takes a little time to start operating very good for the price a little trouble using it inside ca • it rocks man this is the rockinest think i've ever seen or buyed dudes check it ou • does not retract agree with the prior reviewers i can not get it to retract any longer and that was only after 3 uses • dont buy not worth a cent got it at walmart can't even remove a scuff i give it 100 good thing i could return it • flash drive excelent hard drive good price and good time for seller thanks

  41. Empirical analysis • Multiple domains: • dvd, books, electronics, kitchen appliance. • Language model: • build a model for each domain • unlike the theory, this is an additional error source • Tested on mixture distribution • known mixture parameters • Target: score (1-5) • error: Mean Square Error (MSE)

  42. Distribution weighted kitchen dvd books electronics linear

  43. Summary

  44. Summary • Adaptation model • combining rules • linear • distribution weighted • Theoretical analysis • mixture distribution • Future research • algorithms for combining rules • beyond mixtures

  45. Thank You!

  46. Adaptation – Our Model • Input: • target function: f • k distributions D1, …, Dk • k hypothesis: h1, …, hk • For every i: L(Di,hi,f) ≤ε • where L(D,h,f) defines the expected loss • think L(D,h,f)= Ex~D[ |f(x)-h(x)| ]

More Related