1 / 49

Empirical Development of an Exponential Probabilistic Model

Empirical Development of an Exponential Probabilistic Model. Using Textual Analysis to Build a Better Model Jaime Teevan & David R. Karger CSAIL (LCS+AI), MIT. Goal: Better Generative Model. Generative v. discriminative model Applies to many applications Information retrieval (IR)

mandelina
Download Presentation

Empirical Development of an Exponential Probabilistic Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Empirical Development of anExponential Probabilistic Model Using Textual Analysis to Build a Better Model Jaime Teevan & David R. Karger CSAIL (LCS+AI), MIT

  2. Goal: Better Generative Model • Generative v. discriminative model • Applies to many applications • Information retrieval (IR) • Relevance feedback • Using unlabeled data • Classification • Assumptions explicit

  3. Using a Model for IR Hyper-learn • Define model • Learn parameters from query • Rank documents • Better model improves applications • Trickle down to improve retrieval • Classification, relevance feedback, … • Corpus specific models

  4. Overview • Related work • Probabilistic models • Example: Poisson Model • Compare model to text • Hyper-learning the model • Exponential framework • Investigate retrieval performance • Conclusion and future work

  5. Related Work • Using text for retrieval algorithm • [Jones, 1972], [Greiff, 1998] • Using text to model text • [Church & Gale, 1995], [Katz, 1996] • Learning model parameters • [Zhai & Lafferty, 2002] Hyper-learn the model from text!

  6. Probabilistic Models • Rank documents by RV =Pr(rel|d) • Naïve Bayesian models RV =Pr(rel|d)

  7. Probabilistic Models • Rank documents by RV =Pr(rel|d) • Naïve Bayesian models # occs in doc = Pr(dt|rel) features t RV =Pr(rel|d) Pr(d|rel) 8 words • Open assumptions • Feature definition • Feature distribution family Defines the model!

  8. Using a Naïve Bayesian Model • Define model • Learn parameters from query • Rank documents

  9. Using a Naïve Bayesian Model • Define model • Learn parameters from query • Rank documents Pr(dt|rel) =

  10. Using a Naïve Bayesian Model • Define model • Learn parameters from query • Rank documents • Poisson Model • θ: specifies term distribution dt -θ θ e Pr(dt|rel) = dt!

  11. Example Poisson Distribution + θ=0.0006 Pr(dt|rel) Pr(dt|rel)≈1E-15 Term occurs exactlydt times

  12. Using a Naïve Bayesian Model • Define model • Learn parameters from query • Rank documents • Learn a θ for each term • Maximum likelihood θ • Term’s average number of occurrence • Incorporate prior expectations

  13. Using a Naïve Bayesian Model • Define model • Learn parameters from query • Rank documents

  14. Using a Naïve Bayesian Model • Define model • Learn parameters from query • Rank documents • For each document, find RV • Sort documents by RV = Pr(dt|rel). words t RV

  15. Using a Naïve Bayesian Model • Define model • Learn parameters from query • Rank documents Which step goes wrong? • For each document, find RV • Sort documents by RV = Pr(dt|rel). words t RV

  16. Using a Naïve Bayesian Model • Define model • Learn parameters from query • Rank documents

  17. Using a Naïve Bayesian Model • Define model • Learn parameters from query • Rank documents dt -θ θ e Pr(dt|rel) = dt!

  18. How Good is the Model? + θ=0.0006 Pr(dt|rel) 15 times Term occurs exactlydt times

  19. How Good is the Model? + θ=0.0006 Pr(dt|rel) Misfit! 15 times Term occurs exactlydt times

  20. Hyper-learning a Better FitThrough Textual Analysis Using an Exponential Framework

  21. Hyper-Learning Framework • Need framework for hyper-learning Mixtures Poisson Bernoulli Normal

  22. Hyper-Learning Framework • Need framework for hyper-learning • Goal: Same benefits as Poisson Model • One parameter • Easy to work with (e.g., prior) Mixtures Poisson Bernoulli Normal One parameter exponential families

  23. Exponential Framework • Well understood, learning easy • [Bernardo & Smith, 1994], [Gous, 1998] Pr(dt|rel) = f(dt)g(θ)e • Functions f(dt) and h(dt) specify family • E.g., Poisson: f(dt) = (dt!)-1,h(dt) = dt • Parameter θ term’s specific distribution θh(dt)

  24. Using a Hyper-learned Model • Define model • Learn parameters from query • Rank documents

  25. Using a Hyper-learned Model • Hyper-learn model • Learn parameters from query • Rank documents

  26. Using a Hyper-learned Model • Hyper-learn model • Learn parameters from query • Rank documents • Want “best” f(dt) and h(dt) • Iterative hill climbing • Local maximum • Poisson starting point

  27. Using a Hyper-learned Model • Hyper-learn model • Learn parameters from query • Rank documents • Data: TREC query result sets • Past queries to learn about future queries • Hyper-learn and test with different sets

  28. Recall the Poisson Distribution + Pr(dt|rel) 15 times Term occurs exactlydt times

  29. Poisson Starting Point - h(dt) + h(dt) Pr(dt|rel) =f(dt)g(θ)e θh(dt) dt

  30. Hyper-learned Model - h(dt) Hyper-learned Model - h(dt) + h(dt) Pr(dt|rel) =f(dt)g(θ)e θh(dt) dt

  31. Poisson Distribution + Pr(dt|rel) 15 times Term occurs exactlydt times

  32. Hyper-learned Distribution Hyper-learned Distribution + Pr(dt|rel) 15 times Term occurs exactlydt times

  33. Hyper-learned Distribution Hyper-learned Distribution + Pr(dt|rel) 5 times Term occurs exactlydt times

  34. Hyper-learned Distribution Hyper-learned Distribution + Pr(dt|rel) 30 times Term occurs exactlydt times

  35. Hyper-learned Distribution Hyper-learned Distribution + Pr(dt|rel) 300 times Term occurs exactlydt times

  36. Performing Retrieval • Hyper-learn model • Learn parameters from query • Rank documents

  37. Performing Retrieval Labeled docs • Hyper-learn model • Learn parameters from query • Rank documents θh(dt) Pr(dt|rel) = f(dt)g(θ)e • Learn θ for each term

  38. Learning θ • Sufficient statistics • Summarize all observed data • τ1: # of observations • τ2: Σobservations d h(dt) • Incorporating prior easy • Map τ1 and τ2θ 20 labeled documents

  39. Performing Retrieval • Hyper-learn model • Learn parameters from query • Rank documents

  40. Results: Labeled Documents Results: Labeled Documents Precision Recall

  41. Results: Labeled Documents Results: Labeled Documents Precision Recall

  42. Performing Retrieval • Hyper-learn model • Learn parameters from query • Rank documents Short query

  43. Retrieval: Query Retrieval: Query • Query = single labeled document • Vector space-like equation RV = Σa(t,d) + Σb(q,d) • Problem: Document dominates • Solution: Use only query portion • Another solution: Normalize t in doc q in query

  44. Retrieval: Query Precision Recall

  45. Retrieval: Query Precision Recall

  46. Retrieval: Query Precision Recall

  47. Conclusion • Probabilistic models • Example: Poisson Model • Hyper-learning the model • Exponential framework • Learned a better model • Investigate retrieval performance - Bad text model - Easy to work with - Heavy tailed! - Better …

  48. Future Work • Use model better • Use for other applications • Other IR applications • Classification • Correct for document length • Hyper-learn on different corpora • Test if learned model generalizes • Different for genre? Language? People? • Hyper-learn model better

  49. Questions? Contact us with questions: Jaime Teevan teevan@ai.mit.edu David Karger karger@theory.lcs.mit.edu

More Related