520 likes | 710 Views
Review of Introduction to AI. Lirong Xia. Tuesday, May 6, 2014. About the final exam. When: Tues, 5/13, 3- 6pm Where: Low 4050 Same rule as midterm open book and lecture notes simple calculators are allowed cannot use smartphone/laptops/ wifi No Joe’s OH tomorrow
E N D
Review of Introduction to AI Lirong Xia Tuesday, May 6, 2014
About the final exam • When: Tues, 5/13, 3-6pm • Where: Low4050 • Same rule as midterm • open book and lecture notes • simple calculators are allowed • cannot use smartphone/laptops/wifi • No Joe’s OH tomorrow • 5/9 in class office hours • please bring your HW2
Outline • Search • uninformed search • informed search • CSP • planning • minimax, alpha-beta pruning • expectimax • Probabilistic inference • Machine learning
Search Problems • A search problem consists of: • A state space …… • A successor function (with actions, costs) • A start state and a goal test • A solution is a sequence of actions (a plan) which transforms the start state to a goal state
Search algorithms • Uninformed search • BFS • DFS • Informed search • UCS • Best first (greedy) • A*
State Graphs vs. Search Trees State graph Search trees • State graphs: a representation of the search problem • each node is an abstract of the state of the world • Search tree: a tool that helps us to find the solution • each node represents an entire path in the graph • tree nodes are constructed on demand and we construct as little as possible
Fixed BFS • Never expand a node whose state has been visited • Fringe can be maintained as a First-In-First-Out (FIFO) queue (class Queue in util.py) • Maintain a set of visited states • fringe := {node corresponding to initial state} • loop: • if fringe empty, declare failure • choose and remove the top node v from fringe • check if v’s state s is a goal state; if so, declare success • if v’s state has been visited before, skip • if not, expand v, insert resulting nodes whose states have not been visited into fringe
A*: Combining UCS and Greedy • Uniform-cost orders by path cost • Greedy orders by goal proximity, or forward cost • A* search orders by the sum:
Admissible Heuristics • A heuristic is admissible (optimistic) if: • where is the true cost to a nearest goal • Examples: • Coming up with admissible heuristics is most of what’s involved in using A* in practice
Consistency of Heuristics • Stronger than admissibility • Definition: • real cost cost implied by heuristic • Consequences: • The f value along a path never decreases
Constraint Satisfaction Problems • Standard search problems: • State is a “black box”: arbitrary data structure • Goal test: any function over states • Successor function can be anything • Constraint satisfaction problems (CSPs): • A special subset of search problems • State is defined by variables with values from a domain (sometimes depends on ) • Goal test is a set of constraints specifying allowable combinations of values for subsets of variables
Constraint Graphs • Binary CSP: each constraint relates (at most) two variables • Binary constraint graph: nodes are variables, arcs show constraints • General-purpose CSP algorithms use the graph structure to speed up search. E.g., Tasmania is an independent subproblem!
CSP algorithms • A special search problem • constraints presented by a graph • Backtracking search • DFS with fixed order, choose one value in every step • Improvements of backtracking search
Arc Consistency of a CSP • A simple form of propagation makes sure all arcs are consistent: • If V loses a value, neighbors of V need to be rechecked! • Arc consistency detects failure earlier than forward checking • Can be run as a preprocessor or after each assignment • Might be time-consuming Delete from tail! X X X
Improving Backtracking • General-purpose ideas give huge gains in speed • Ordering: • Minimum remaining values (MRV) • least constraining value • Filtering: Can we detect inevitable failure early? • forward checking search • Structure of the problem • constraint graph is a tree
Planning problems • STRIPS language • state of the world: conjunction of positive, ground, function-free literals • Action • Preconditions: a set of activating literals • Effects: updates of active literals
Blocks world B D • Start: On(B, A), On(A, Table), On(D, C), On(C, Table), Clear(B), Clear(D) • Move(x,y,z) • Preconditions: • On(x,y), Clear(x), Clear(z) • Effects: • On(x,z), Clear(y), NOT(On(x,y)), NOT(Clear(z)) • MoveToTable(x,y) • Preconditions: • On(x,y), Clear(x) • Effects: • On(x,Table), Clear(y), NOT(On(x,y)) A C
Blocks world example B D • Goal: On(A,B) AND Clear(A) AND On(C,D) AND Clear(C) • A plan: MoveToTable(B, A), MoveToTable(D, C), Move(C, Table, D), Move(A, Table, B) A C
Adversarial Games • Deterministic, zero-sum games: • Tic-tac-toe, chess, checkers • The MAX player maximizes result • The MIN player minimizes result • Minimax search: • A search tree • Players alternate turns • Each node has a minimax value: best achievable utility against a rational adversary
Alpha-Beta Pruning • General configuration • We’re computing the MIN-VALUE at n • We’re looping over n’s children • n’s value estimate is dropping • α is the best value that MAX can get at any choice point along the current path • If n becomes worse than α, MAX will avoid it, so can stop considering n’s other children • Define β similarly for MIN • α is usually smaller than β • Once α >= β, return to the upper layer
Expectimax Search Trees • Expectimax search • Max nodes (we) as in minimaxsearch • Chance nodes • Need to compute chance node values as expected utilities
Outline • Search • Probabilistic inference • Bayesian networks • probability representation • conditional independence (d-separation) • inference (variable elimination) • Markov decision process • value iteration • policy iteration • Hidden Markov models • filtering • Machine learning
Bayesian networks • Definition of Bayesian network (Bayes’ net or BN) • A set of nodes, one per variable X • A directed, acyclic graph • A conditional distribution for each node • A collection of distributions over X, one for each combination of parents’ values p(X|a1,…,an) • CPT: conditional probability table A Bayesian network = Topology (graph) + Local Conditional Probabilities
Probabilities in BNs • Bayesian networks implicitly encode joint distributions • As a product of local conditional distributions • Example: • This lets us reconstruct any entry of the full joint • Not every BN can represent every joint distribution • The topology enforces certain conditional independencies
Reachability (D-Separation) • Question: are X and Y conditionally independent given evidence vars {Z}? • Yes, if X and Y “separated” by Z • Look for active paths from X to Y • No active paths = independence! • A path is active if each triple is active: • Causal chain where B is unobserved (either direction) • Common cause where B is unobserved • Common effect where B or one of its descendents is observed • All it takes to block a path is a single inactive segment
Variable elimination Sprinklers were on p(+R) = .2 Rained • From the factor Σnp(n|+R)p(+D|n,g) we sum out n to obtain a factor only depending on g • [Σnp(n|+R)p(+D|n,+G)] = p(+N|+R)P(+D|+N,+G) + p(-N|+R)p(+D|-N,+G) = .3*.9+.7*.5 = .62 • [Σnp(n|+R)p(+D|n,-G)] = p(+N|+R)p(+D|+N,-G) + p(-N|+R)p(+D|-N,-G) = .3*.4+.7*.3 = .33 • Continuing to the left, g will be summed out next, etc. (continued on board) p(+S) = .6 Neighbor walked dog p(+G|+R,+S) = .9 p(+G|+R,-S) = .7 p(+G|-R,+S) = .8 p(+G|-R,-S) = .2 Grass wet p(+N|+R) = .3 p(+N|-R) = .4 p(+D|+N,+G) = .9 p(+D|+N,-G) = .4 p(+D|-N,+G) = .5 p(+D|-N,-G) = .3 Dog wet
Markov Decision Processes • An MDP is defined by: • A set of statess∈S • A set of actionsa∈A • A transition function T(s,a,s’) • Prob that a from s leads to s’ • i.e., p(s’|s,a) • sometimes called the model • A reward function R(s, a, s’) • Sometimes just R(s) or R(s’) • A start state (or distribution) • Maybe a terminal state • MDPs are a family of nondeterministic search problems • Reinforcement learning (next class): MDPs where we don’t know the transition or reward functions
Defining MDPs • Markov decision processes: • States S • Start state s0 • Actions A • Transition p(s’|s,a) (or T(s,a,s’)) • Reward R(s,a,s’) (and discount ) • MDP quantities so far: • Policy = Choice of action for each (MAX) state • Utility (or return) = sum of discounted rewards
The Bellman Equations • Definition of “optimal utility” leads to a simple one-step lookahead relationship amongst optimal utility values: Optimal rewards = maximize over first and then follow optimal policy • Formally:
Value Iteration • Idea: • Start with V1(s) = 0 • Given Vi, calculate the values for all states for depth i+1: • Repeat until converge • Use Vias evaluation function when computing Vi+1
Policy Iteration • Alternative approach: • Step 1: policy evaluation: calculate utilities for some fixed policy (not optimal utilities!) • Step 2: policy improvement: update policy using one-step look-ahead with resulting converged (but not optimal!) utilities as future values • Repeat steps until policy converges
Markov Models • A Markov model is a chain-structured BN • Conditional probabilities are the same (stationarity) • Value of X at a given time is called the state • As a BN: • Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial probs) p(X1) p(X|X-1)
Hidden Markov Models • Markov chains not so useful for most agents • Eventually you don’t know anything anymore • Need observations to update your beliefs • Hidden Markov models (HMMs) • Underlying Markov chain over state X • You observe outputs (effects) at each time step • As a Bayes’ net:
HMM weather example: Filtering .6 p(w|s) = .1 p(w|c) = .3 p(w|r) = .8 s .1 .3 .4 .3 .2 c r .3 .3 .5 • You have been stuck in the lab for three days (!) • On those days, your labmate was dry, wet, wet, respectively • What is the probability that it is now raining outside? • p(X3 = r | E1= d, E2 = w, E3 = w)
Formal algorithm for filtering • The forward algorithm • Elapse of time • compute p(Xt+1|Xt,e1:t) from p(Xt|e1:t) • Observe • compute p(Xt+1|e1:t+1) from p(Xt+1|e1:t) • Renormalization
Forward algorithm vs. particle filtering Forward algorithm Particle filtering • Elapse of time B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1) • Observe B(Xt) ∝p(et|Xt)B’(Xt) • Renormalize B(xt) sum up to 1 • Elapse of time x--->x’ • Observe w(x’)=p(et|x) • Resample resample N particles
Outline • Search • Probabilistic inference • Machine learning • supervised learning • Parametric • generative: Naïve Bayes • discriminative method: perceptrons and MIRA • Non-parametric: K-NN • unsupervised learning • k-means • reinforcement learning • Q-learning
Important Concepts • Data: labeled instances, e.g. emails marked spam/ham • Training set • Held out set (we will give examples today) • Test set • Features: attribute-value pairs that characterize each x • Experimentation cycle • Learn parameters (e.g. model probabilities) on training set • (Tune hyperparameters on held-out set) • Compute accuracy of test set • Very important: never “peek” at the test set! • Evaluation • Accuracy: fraction of instances predicted correctly • Overfitting and generalization • Want a classifier which does well on test data • Overfitting: fitting the training data very closely, but not generalizing well
General Naive Bayes • A general naive Bayesmodel: • We only specify how each feature depends on the class • Total number of parameters is linear in n
Estimation: Laplace Smoothing • Laplace’s estimate (extended): • Pretend you saw every outcome k extra times • What’s Laplace with k=0? • k is the strength of the prior • Laplace for conditionals: • Smooth each condition independently:
Generative vs. Discriminative • Generative classifiers: • E.g. naive Bayes • A causal model with evidence variables • Query model for causes given evidence • Discriminative classifiers: • No causal model, no Bayes rule, often no probabilities at all! • Try to predict the label Y directly from X • Robust, accurate with varied features • Loosely: mistake driven rather than model driven
Linear Classifiers (perceptrons) • Inputs are feature values • Each feature has a weight • Sum is the activation • If the activation is: • Positive: output +1 • Negative, output -1
Learning: Multiclass Perceptron • Start with all weights = 0 • Pick up training examples one by one • Predict with current weights • If correct, no change! • If wrong: lower score of wrong answer, raise score of right answer
MIRA • Idea: adjust the weight update to mitigate these effects • MIRA*: choose an update size that fixes the current mistake *Margin Infused Relaxed Algorithm
Parametric / Non-parametric • Parametric models: • Fixed set of parameters • More data means better settings • Non-parametric models: • Complexity of the classifier increases with data • Better in the limit, often worse in the non-limit • (K)NN is non-parametric
K-Means • An iterative clustering algorithm • Pick K random points as cluster centers (means) • Alternate: • Assign data instances to closest mean • Assign each mean to the average of its assigned points • Stop when no points’ assignments change
K-Means as Optimization • Consider the total distance to the means: • Each iteration reduces phi • Two states each iteration: • Update assignments: fix means c, change assignments a • Update means: fix assignments a, change means c points assignments means
Reinforcement learning • Similar to MDP • Don’t know T and/or R, but can observe R • Learn by doing • can have multiple episodes (trials)
MDPs vs. RL Techniques: • Computation • Value and policy iteration • Policy evaluation • Model-based RL • sampling • Model-free RL: • Q-learning Things we know how to do: • If we know the MDP • Compute V*, Q*, π* exactly • Evaluate a fixed policy π • If we don’t know the MDP • If we can estimate the MDP then solve • We can estimate V for a fixed policy π • We can estimate Q*(s,a) for the optimal policy while executing an exploration policy
Q-Learning • Q-Learning: sample-based Q-value iteration • Learn Q*(s,a) values • Receive a sample (s,a,s’,R) • Consider your old estimate: Q(s,a) • Consider your new sample estimate: • Incorporate the new estimate into a running average