CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial IntelligenceSpring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley

Final Exam • 8:10 to 11 AM on 5/15/2007 at 50 BIRGE • Final prep page up • Includes all topics (see page). • Weighted toward post midterm topics. • 2 double sided cheat sheets allowed as is a calculator. • Final exam review Thursday 4 PM Soda 306.

Utility-Based Agents

Today • Review of post midterm topics relevant for the final. • Reasoning about time • Markov Models • HMM forward algorithm, Vitterbi Algorithm. • Classification • Naïve Bayes, Perceptron • Reinforcement Learning • MDP, Value Iteration, Policy iteration • TD-value learning, Q-learning, • Advanced topics • Applications to NLP

Questions • What is the basic conditional independence assertion for markov models? • What is a problem with Markov Models for prediction into the future? • What are the basic CI assertions for HMM? • How do inference algorithms exploit the CI assertions • Forward Algorithm • Viterbi algorithm.

Markov Models • A Markov model is a chain-structured BN • Each node is identically distributed (stationarity) • Value of X at a given time is called the state • As a BN: • Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial probs) X1 X2 X3 X4

Conditional Independence • Basic conditional independence: • Past and future independent of the present • Each time step only depends on the previous • This is called the (first order) Markov property • Note that the chain is just a (growing BN) • We can always use generic BN reasoning on it (if we truncate the chain) X1 X2 X3 X4

Example • From initial state (observation of sun) • From initial state (observation of rain) P(X1) P(X2) P(X3) P(X) P(X1) P(X2) P(X3) P(X)

X1 X2 X3 X4 X5 E1 E2 E3 E4 E5 Hidden Markov Models • Markov chains not so useful for most agents • Eventually you don’t know anything anymore • Need observations to update your beliefs • Hidden Markov models (HMMs) • Underlying Markov chain over states S • You observe outputs (effects) at each time step • As a Bayes’ net:

Example • An HMM is • Initial distribution: • Transitions: • Emissions:

Conditional Independence • HMMs have two important independence properties: • Markov hidden process, future depends on past via the present • Current observation independent of all else given current state • Quiz: does this mean that observations are independent given no evidence? • [No, correlated by the hidden state] X1 X2 X3 X4 X5 E1 E2 E3 E4 E5

X1 X2 X3 X4 X5 E1 E2 E3 E4 E5 Forward Algorithm • Can ask the same questions for HMMs as Markov chains • Given current belief state, how to update with evidence? • This is called monitoring or filtering • Formally, we want:

X1 X2 X3 X4 X5 E1 E2 E3 E4 E5 Viterbi Algorithm • Question: what is the most likely state sequence given the observations? • Slow answer: enumerate all possibilities • Better answer: cached incremental version

Classification • Supervised Models • Generative Models • Naïve Bayes • Discriminative Models • Perceptron • Unsupervised Models • K-means • Agglomerative Cluster

Parameter estimation • What are the parameters for Naïve Bayes? • What is Maximum Likelihood estimation for NB? • What are the problems with ML estimates?

General Naïve Bayes • A general naive Bayes model: • We only specify how each feature depends on the class • Total number of parameters is linear in n |C| x |E|n parameters C E1 E2 En n x |E| x |C| parameters |C| parameters

Estimation: Smoothing • Problems with maximum likelihood (relative frequency) estimates: • If I flip a coin once, and it’s heads, what’s the estimate for P(heads)? • What if I flip 10 times with 8 heads? • What if I flip 10M times with 8M heads? • Basic idea: • We have some prior expectation about parameters (here, the probability of heads) • Given little evidence, we should skew towards our prior • Given a lot of evidence, we should listen to the data

Estimation: Laplace Smoothing • Laplace’s estimate (extended): • Pretend you saw every outcome k extra times • What’s Laplace with k = 0? • k is the strength of the prior • Laplace for conditionals: • Smooth each condition independently: H H T

Types of Supervised classifiers • Generative Models • Naïve Bayes • Discriminative Models • Perceptron

Questions • What is a binary threshold perceptron? • How can we make a multi-class perceptron? • What sorts of patterns can perceptrons classify correctly

The Binary Perceptron • Inputs are features • Each feature has a weight • Sum is the activation • If the activation is: • Positive, output 1 • Negative, output 0 w1  f1 w2 >0? f2 w3 f3

The Multiclass Perceptron • If we have more than two classes: • Have a weight vector for each class • Calculate an activation for each class • Highest activation wins

Linear Separators • Binary classification can be viewed as the task of separating classes in feature space: w . x = 0 w . x > 0 w . x < 0

Feature design • Can we design features f1 and f2 to use a perceptron to separate the the two classes?

MDP and Reinforcement Learning • What is an MDP (Basics) ? • What is Bellman’s equation and how is it used in value iteration? • What is reinforcement learning • TD-value learning • Q learning • Exploration vs. exploitation

Markov Decision Processes • Markov decision processes (MDPs) • A set of states s  S • A model T(s,a,s’) = P(s’ | s,a) • Probability that action a in state s leads to s’ • A reward function R(s, a, s’) (sometimes just R(s) for leaving a state or R(s’) for entering one) • A start state (or distribution) • Maybe a terminal state • MDPs are the simplest case of reinforcement learning • In general reinforcement learning, we don’t know the model or the reward function

That’s my equation! Bellman’s Equation for Selecting actions • Definition of utility leads to a simple relationship amongst optimal utility values: Optimal rewards = maximize over first action and then follow optimal policy Formally: Bellman’s Equation

State Reward Action Elements of RL • Transition model, how action influences states • Reward R, immediate value of state-action transition • Policy , maps states to actions Agent Policy Environment

MDPs • Which of the following are true? A B C D E

Reinforcement Learning • What’s wrong with the following agents?

s a s, a s,a,s’ s’ Model-Free Learning • Big idea: why bother learning T? • Update each time we experience a transition • Frequent outcomes will contribute more updates (over time) • Temporal difference learning (TD) • Policy still fixed! • Move values toward value of whatever successor occurs

s a s, a s,a,s’ s’ Problems with TD Value Learning • TD value learning is model-free for policy evaluation • However, if we want to turn our value estimates into a policy, we’re sunk: • Idea: Learn state-action pairings (Q-values) directly • Makes action selection model-free too!

Q-Learning • Learn Q*(s,a) values • Receive a sample (s,a,s’,r) (select a using e-greedy) • Consider your old estimate: • Consider your new sample estimate: • Nudge the old estimate towards the new sample • Set s = s’ until s is terminal

Applications to NLP • How can generative models play a role in MT, Speech, NLP? • List three kinds of ambiguities often found in language?

Ambiguities • Headlines: • Iraqi Head Seeks Arms • Ban on Nude Dancing on Governor’s Desk • Juvenile Court to Try Shooting Defendant • Teacher Strikes Idle Kids • Stolen Painting Found by Tree • Kids Make Nutritious Snacks • Local HS Dropouts Cut in Half • Hospitals Are Sued by 7 Foot Doctors • Why are these funny?

Learning • I hear and I forget • I see and I remember • I do and I understand • attributed to Confucius 551-479 B.C.

Thanks! And good luck on the final and for the future! Srini Narayanan snarayan@icsi.berkeley.edu

Phase II: Update Means • Move each mean to the average of its assigned points: • Also can only decrease total distance… (Why?) • Fun fact: the point y with minimum squared Euclidean distance to a set of points {x} is their mean

CS 188: Artificial Intelligence Spring 2007