1 / 83

Active Learning for Hidden Markov Models

???. Active Learning for Hidden Markov Models. Brigham Anderson, Andrew Moore brigham@cmu.edu , awm@cs.cmu.edu Computer Science Carnegie Mellon University. Outline. Active Learning Hidden Markov Models Active Learning + Hidden Markov Models. Notation. We Have: Dataset, D

calix
Download Presentation

Active Learning for Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ??? Active Learning for Hidden Markov Models Brigham Anderson, Andrew Moore brigham@cmu.edu, awm@cs.cmu.edu Computer Science Carnegie Mellon University

  2. Outline • Active Learning • Hidden Markov Models • Active Learning + Hidden Markov Models

  3. Notation • We Have: • Dataset, D • Model parameter space, W • Query algorithm, q

  4. Dataset (D) Example

  5. Notation • We Have: • Dataset, D • Model parameter space, W • Query algorithm, q

  6. Model Example St Ot Probabilistic Classifier Notation T : Number of examples Ot : Vector of features of example t St : Class of example t

  7. Model Example Patient state (St) St : DiseaseState Patient Observations (Ot) Ot1 : Gender Ot2 : Age Ot3 : TestA Ot4 : TestB Ot5 : TestC

  8. Gender TestB Age Gender S TestA S TestA Age TestB TestC TestC Possible Model Structures

  9. Model Space St Ot Model: Model Parameters: P(St) P(Ot|St) Generative Model: Must be able to compute P(St=i, Ot=ot | w)

  10. Model Parameter Space (W) • W = space of possible parameter values • Prior on parameters: • Posterior over models:

  11. Notation • We Have: • Dataset, D • Model parameter space, W • Query algorithm, q q(W,D) returns t*, the next sample to label

  12. Game while NotDone • Learn P(W | D) • q chooses next example to label • Expert adds label to D

  13. Simulation ? ? S7=false S2=false S5=true ? O1 O2 O3 O4 O5 O6 O7 S1 S2 S3 S4 S5 S6 S7 hmm… q

  14. Active Learning Flavors • Pool (“random access” to patients) • Sequential (must decide as patients walk in the door)

  15. q? • Recall: q(W,D) returns the “most interesting” unlabelled example. • Well, what makes a doctor curious about a patient?

  16. 1994

  17. Score Function

  18. Uncertainty Sampling Example FALSE

  19. Uncertainty Sampling Example FALSE TRUE

  20. Uncertainty Sampling Random Sampling Uncertainty Sampling Random Sampling Uncertainty Sampling Random Sampling Uncertainty Sampling Random Sampling

  21. Uncertainty Sampling GOOD: couldn’t be easier GOOD: often performs pretty well BAD: H(St) measures information gain about the samples, not the model Sensitive to noisy samples

  22. Can we do better thanuncertainty sampling?

  23. 1992

  24. Strategy #2:Query by Committee Temporary Assumptions: Pool  Sequential P(W | D)  Version Space Probabilistic  Noiseless • QBC attacks the size of the “Version space”

  25. O1 O2 O3 O4 O5 O6 O7 S1 S2 S3 S4 S5 S6 S7 FALSE! FALSE! Model #1 Model #2

  26. O1 O2 O3 O4 O5 O6 O7 S1 S2 S3 S4 S5 S6 S7 TRUE! TRUE! Model #1 Model #2

  27. O1 O2 O3 O4 O5 O6 O7 S1 S2 S3 S4 S5 S6 S7 FALSE! TRUE! Ooh, now we’re going to learn something for sure! One of them is definitely wrong. Model #1 Model #2

  28. The Original QBCAlgorithm As each example arrives… • Choose a committee, C, (usually of size 2) randomly from Version Space • Have each member of C classify it • If the committee disagrees, select it.

  29. 1992

  30. QBC: Choose “Controversial” Examples STOP! Doesn’t model disagreement mean “uncertainty”? Why not use Uncertainty Sampling? Version Space

  31. Remember our whiny objection to Uncertainty Sampling? “H(St) measures information gain about the samples not the model.” BUT… If the source of the sample uncertainty is model uncertainty, then they equivalent! Why? Symmetry of mutual information.

  32. (1995)

  33. Note: Pool-based again Note: No more Version space Dagan-Engelson QBC For each example… • Choose a committee, C, (usually of size 2) randomly from P(W | D) • Have each member C classify it • Compute the “Vote Entropy” to measure disagreement

  34. How to Generate the Committee? • This important point is not covered in the talk. • Vague Suggestions: • Good conjugate priors for parameters • Importance sampling

  35. OK, we could keep extending QBC, but let’s cut to the chase…

  36. Informative with respect to what? 1992

  37. Model Entropy P(W|D) P(W|D) P(W|D) W W W H(W) = high …better… H(W) = 0

  38. Current model space entropy Expected model space entropy if we learn St Information-Gain • Choose the example that is expected to most reduce H(W) • I.e., Maximize H(W) – H(W | St)

  39. Score Function

  40. We usually can’t just sum over all models to get H(St|W) …but we can sample from P(W | D)

  41. Conditional Model Entropy

  42. Score Function

  43. Amazing Entropy Fact Symmetry of Mutual Information MI(A;B) = H(A) – H(A|B) = H(B) – H(B|A)

  44. Score Function Familiar?

  45. Uncertainty Sampling & Information Gain

  46. The information gain framework is cleaner than the QBC framework, and easy to build on • For instance, we don’t need to restrict St to be the “class” variable

  47. Any Missing Feature is Fair Game

  48. Outline • Active Learning • Hidden Markov Models • Active Learning + Hidden Markov Models

  49. HMMs P(S0=1) P(S0=2) … P(S0=n) π0= P(St+1=1|St=1) … P(St+1=n|St=1) P(St+1=1|St=2) … P(St+1=n|St=2) … P(St+1=1|St=n) … P(St+1=n|St=n) A= P(O=1|S=1) … P(O=m|S=1) P(O=1|S=2) … P(O=m|S=2) … P(O=1|S=n) … P(O=m|S=n) B= Model parameters W = {π0,A,B} O0 O1 O2 O3 S0 S1 S0 S2 S3

More Related