HMMs

HMMs Tamara Berg CS 590-133 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan

2048 Game Game AI (Minimax with alpha/beta pruning)

Announcements • HW2 graded • mean = 14.40, median = 16 • Grades to date after 2 HWs, 1 midterm (60% / 40%) • mean = 77.42, median = 81.36

Announcements • HW4 will be online tonight (due April 3) • Midterm2 next Thursday • Review will be Monday, 5pm in FB009

Thursday March 20SN0115:30-6:30 Information Session: BS/MS Program

Recap from b.s.b.

Bayes Nets Directed Graph, G = (X,E) Nodes Edges • Each node is associated with a random variable

Example Joint Probability:

Independence in a BN • Important question about a BN: • Are two nodes independent given certain evidence? • If yes, can prove using algebra (tedious in general) • If no, can prove with a counter example • Example: • Question: are X and Z necessarily independent? • Answer: no. Example: low pressure causes rain, which causes traffic. • X can influence Z, Z can influence X (via Y) • Addendum: they could be independent: how? X Y Z

Bayes Ball • Shade all observed nodes. Place balls at the starting node, let them bounce around according to some rules, and ask if any of the balls reach any of the goal node. • We need to know what happens when a ball arrives at a node on its way to the goal node.

Inference • Inference: calculating some useful quantity from a joint probability distribution • Examples: • Posterior probability: • Most likely explanation: B E A J M

Inference Algorithms • Exact algorithms • Inference by Enumeration • Variable Elimination algorithm • Approximate (sampling) algorithms • Forward sampling • Rejection sampling • Likelihood weighting sampling

Example BN – Naïve Bayes C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class

HW4!

Slide from Dan Klein

Percentage of documents in training set labeled as spam/ham Slide from Dan Klein

In the documents labeled as spam, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein

In the documents labeled as ham, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein

Parameter estimation • Parameter estimate: • Parameter smoothing: dealing with words that were never seen or seen too few times • Laplacian smoothing: pretend you have seen every vocabulary word one more time (or m more times) than you actually did # of word occurrences in spam messages P(word | spam) = total # of words in spam messages # of word occurrences in spam messages + 1 P(word | spam) = total # of words in spam messages + V (V: total number of unique words)

Classification The class that maximizes:

Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow

Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities.

Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. • Since log is a monotonic function, the class with the highest score does not change.

Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. • Since log is a monotonic function, the class with the highest score does not change. • So, what we usually compute in practice is:

Today: Adding time!

Reasoning over Time • Often, we want to reason about a sequence of observations • Speech recognition • Robot localization • Medical monitoring • DNA sequence recognition • Need to introduce time into our models • Basic approach: hidden Markov models (HMMs) • More general: dynamic Bayes’ nets

Reasoning over time • Basic idea: • Agent maintains a belief state representing probabilities over possible states of the world • From the belief state and a transition model, the agent can predict how the world might evolve in the next time step • From observations and an emission model, the agent can update the belief state

Markov Model

Markov Models • A Markov model is a chain-structured BN • Value of X at a given time is called the state • As a BN: ….P(Xt|Xt-1)….. • Parameters: initial probabilities and transition probabilities or dynamics(specify how the state evolves over time) X1 X2 X3 X4

Conditional Independence • Basic conditional independence: • Past and future independent of the present • Each time step only depends on the previous • This is called the (first order) Markov property • Note that the chain is just a (growing) BN • We can always use generic BN reasoning on it if we truncate the chain at a fixed length X1 X2 X3 X4

Example: Markov Chain X1 X2 X3 X4 • Weather: • States: X = {rain, sun} • Transitions: • Initial distribution: 1.0 sun • What’s the probability distribution after one step? 0.1 0.9 rain sun 0.9 0.1

Mini-Forward Algorithm • Question: What’s P(X) on some day t? sun sun sun sun rain rain rain rain Forward simulation

Example • From initial observation of sun • From initial observation of rain P(X1) P(X2) P(X3) P(X) P(X1) P(X2) P(X3) P(X)

Stationary Distributions • If we simulate the chain long enough: • What happens? • Uncertainty accumulates • Eventually, we have no idea what the state is! • Stationary distributions: • For most chains, the distribution we end up in is independent of the initial distribution • Called the stationary distribution of the chain • Usually, can only predict a short time out

Hidden Markov Model

Hidden Markov Models • Markov chains not so useful for most agents • Eventually you don’t know anything anymore • Need observations to update your beliefs • Hidden Markov models (HMMs) • Underlying Markov chain over states S • You observe outputs (effects) at each time step • As a Bayes’ net: X1 X2 X3 X4 X5 E1 E2 E3 E4 E5

Example • An HMM is defined by: • Initial distribution: • Transitions: • Emissions:

Conditional Independence • HMMs have two important independence properties: • Markov hidden process, future depends on past via the present • Current observation independent of all else given current state • Quiz: does this mean that observations are independent given no evidence? • [No, correlated by the hidden state] X1 X2 X3 X4 X5 E1 E2 E3 E4 E5

Real HMM Examples • Speech recognition HMMs: • Observations are acoustic signals (continuous valued) • States are specific positions in specific words (so, tens of thousands) • Machine translation HMMs: • Observations are words (tens of thousands) • States are translation options • Robot tracking: • Observations are range readings (continuous) • States are positions on a map (continuous)

Filtering / Monitoring • Filtering, or monitoring, is the task of tracking the distribution B(X) (the belief state) over time • We start with B(X) in an initial setting, usually uniform • As time passes, or we get observations, we update B(X) • The Kalman filter was invented in the 60’s and first implemented as a method of trajectory estimation for the Apollo program

Example: Robot Localization Example from Michael Pfeiffer t=0 Sensor model: never more than 1 mistake Motion model: may not execute action with small prob. Prob 0 1

Example: Robot Localization t=1 Prob 0 1

HMMs

HMMs

Presentation Transcript

Gene Finding and HMMs

Hidden Markov Models (HMMs)

HMMs and Particle Filters

Pair-HMMs and CRFs

HMMS Release Demo

Variants of HMMs

HMMs

Applications of HMMs

Passage Retrieval using HMMs

Phylogenetic HMMs

Class 8: Pair HMMs

Hidden Markov Models (HMMs)

Hidden Markov Models (HMMs)

Class 5: HMMs and Profile HMMs

Family of HMMs

Hidden Markov Models (HMMs)

Profile HMMs

Natural Language Processing HMMs

Pairwise alignment using HMMs

Hmm, HID HMMs

Applications of HMMs