390 likes | 401 Views
"Get ready for the final exam with a review of key topics, including Markov Models, HMM algorithms, and Classification. Understand utility-based agents and reinforce understanding with advanced topics in AI."
E N D
CS 188: Artificial IntelligenceSpring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley
Final Exam • 8:10 to 11 AM on 5/15/2007 at 50 BIRGE • Final prep page up • Includes all topics (see page). • Weighted toward post midterm topics. • 2 double sided cheat sheets allowed as is a calculator. • Final exam review Thursday 4 PM Soda 306.
Today • Review of post midterm topics relevant for the final. • Reasoning about time • Markov Models • HMM forward algorithm, Vitterbi Algorithm. • Classification • Naïve Bayes, Perceptron • Reinforcement Learning • MDP, Value Iteration, Policy iteration • TD-value learning, Q-learning, • Advanced topics • Applications to NLP
Questions • What is the basic conditional independence assertion for markov models? • What is a problem with Markov Models for prediction into the future? • What are the basic CI assertions for HMM? • How do inference algorithms exploit the CI assertions • Forward Algorithm • Viterbi algorithm.
Markov Models • A Markov model is a chain-structured BN • Each node is identically distributed (stationarity) • Value of X at a given time is called the state • As a BN: • Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial probs) X1 X2 X3 X4
Conditional Independence • Basic conditional independence: • Past and future independent of the present • Each time step only depends on the previous • This is called the (first order) Markov property • Note that the chain is just a (growing BN) • We can always use generic BN reasoning on it (if we truncate the chain) X1 X2 X3 X4
Example • From initial state (observation of sun) • From initial state (observation of rain) P(X1) P(X2) P(X3) P(X) P(X1) P(X2) P(X3) P(X)
X1 X2 X3 X4 X5 E1 E2 E3 E4 E5 Hidden Markov Models • Markov chains not so useful for most agents • Eventually you don’t know anything anymore • Need observations to update your beliefs • Hidden Markov models (HMMs) • Underlying Markov chain over states S • You observe outputs (effects) at each time step • As a Bayes’ net:
Example • An HMM is • Initial distribution: • Transitions: • Emissions:
Conditional Independence • HMMs have two important independence properties: • Markov hidden process, future depends on past via the present • Current observation independent of all else given current state • Quiz: does this mean that observations are independent given no evidence? • [No, correlated by the hidden state] X1 X2 X3 X4 X5 E1 E2 E3 E4 E5
X1 X2 X3 X4 X5 E1 E2 E3 E4 E5 Forward Algorithm • Can ask the same questions for HMMs as Markov chains • Given current belief state, how to update with evidence? • This is called monitoring or filtering • Formally, we want:
X1 X2 X3 X4 X5 E1 E2 E3 E4 E5 Viterbi Algorithm • Question: what is the most likely state sequence given the observations? • Slow answer: enumerate all possibilities • Better answer: cached incremental version
Classification • Supervised Models • Generative Models • Naïve Bayes • Discriminative Models • Perceptron • Unsupervised Models • K-means • Agglomerative Cluster
Parameter estimation • What are the parameters for Naïve Bayes? • What is Maximum Likelihood estimation for NB? • What are the problems with ML estimates?
General Naïve Bayes • A general naive Bayes model: • We only specify how each feature depends on the class • Total number of parameters is linear in n |C| x |E|n parameters C E1 E2 En n x |E| x |C| parameters |C| parameters
Estimation: Smoothing • Problems with maximum likelihood (relative frequency) estimates: • If I flip a coin once, and it’s heads, what’s the estimate for P(heads)? • What if I flip 10 times with 8 heads? • What if I flip 10M times with 8M heads? • Basic idea: • We have some prior expectation about parameters (here, the probability of heads) • Given little evidence, we should skew towards our prior • Given a lot of evidence, we should listen to the data
Estimation: Laplace Smoothing • Laplace’s estimate (extended): • Pretend you saw every outcome k extra times • What’s Laplace with k = 0? • k is the strength of the prior • Laplace for conditionals: • Smooth each condition independently: H H T
Types of Supervised classifiers • Generative Models • Naïve Bayes • Discriminative Models • Perceptron
Questions • What is a binary threshold perceptron? • How can we make a multi-class perceptron? • What sorts of patterns can perceptrons classify correctly
The Binary Perceptron • Inputs are features • Each feature has a weight • Sum is the activation • If the activation is: • Positive, output 1 • Negative, output 0 w1 f1 w2 >0? f2 w3 f3
The Multiclass Perceptron • If we have more than two classes: • Have a weight vector for each class • Calculate an activation for each class • Highest activation wins
Linear Separators • Binary classification can be viewed as the task of separating classes in feature space: w . x = 0 w . x > 0 w . x < 0
Feature design • Can we design features f1 and f2 to use a perceptron to separate the the two classes?
MDP and Reinforcement Learning • What is an MDP (Basics) ? • What is Bellman’s equation and how is it used in value iteration? • What is reinforcement learning • TD-value learning • Q learning • Exploration vs. exploitation
Markov Decision Processes • Markov decision processes (MDPs) • A set of states s S • A model T(s,a,s’) = P(s’ | s,a) • Probability that action a in state s leads to s’ • A reward function R(s, a, s’) (sometimes just R(s) for leaving a state or R(s’) for entering one) • A start state (or distribution) • Maybe a terminal state • MDPs are the simplest case of reinforcement learning • In general reinforcement learning, we don’t know the model or the reward function
That’s my equation! Bellman’s Equation for Selecting actions • Definition of utility leads to a simple relationship amongst optimal utility values: Optimal rewards = maximize over first action and then follow optimal policy Formally: Bellman’s Equation
State Reward Action Elements of RL • Transition model, how action influences states • Reward R, immediate value of state-action transition • Policy , maps states to actions Agent Policy Environment
MDPs • Which of the following are true? A B C D E
Reinforcement Learning • What’s wrong with the following agents?
s a s, a s,a,s’ s’ Model-Free Learning • Big idea: why bother learning T? • Update each time we experience a transition • Frequent outcomes will contribute more updates (over time) • Temporal difference learning (TD) • Policy still fixed! • Move values toward value of whatever successor occurs
s a s, a s,a,s’ s’ Problems with TD Value Learning • TD value learning is model-free for policy evaluation • However, if we want to turn our value estimates into a policy, we’re sunk: • Idea: Learn state-action pairings (Q-values) directly • Makes action selection model-free too!
Q-Learning • Learn Q*(s,a) values • Receive a sample (s,a,s’,r) (select a using e-greedy) • Consider your old estimate: • Consider your new sample estimate: • Nudge the old estimate towards the new sample • Set s = s’ until s is terminal
Applications to NLP • How can generative models play a role in MT, Speech, NLP? • List three kinds of ambiguities often found in language?
NLP applications ofBayes Rules!! • Handwriting recognition • P (text | strokes) = P (text) * P (strokes | text) • Spelling correction • P (text | typos) = P (text) * P (typos | text) • OCR • P (text | image) = P (text) * P (image | text) • MT • P (english | french) = P (english) * P (french| english) • Speech recognition • P (language | sound) = P (LM) * P (sound | LM)
Ambiguities • Headlines: • Iraqi Head Seeks Arms • Ban on Nude Dancing on Governor’s Desk • Juvenile Court to Try Shooting Defendant • Teacher Strikes Idle Kids • Stolen Painting Found by Tree • Kids Make Nutritious Snacks • Local HS Dropouts Cut in Half • Hospitals Are Sued by 7 Foot Doctors • Why are these funny?
Learning • I hear and I forget • I see and I remember • I do and I understand • attributed to Confucius 551-479 B.C.
Thanks! And good luck on the final and for the future! Srini Narayanan snarayan@icsi.berkeley.edu
Phase II: Update Means • Move each mean to the average of its assigned points: • Also can only decrease total distance… (Why?) • Fun fact: the point y with minimum squared Euclidean distance to a set of points {x} is their mean