590 likes | 672 Views
Artificial Intelligence. Rehearsal Lesson. Solving Problems with Search Algorithms. Input: a problem P . Preprocessing: Define states and a state space Define Operators Define a start state and goal set of states. Processing:
E N D
Artificial Intelligence Rehearsal Lesson Ram Meshulam 2004
Solving Problems with Search Algorithms • Input: a problem P. • Preprocessing: • Define states and a state space • Define Operators • Define a start state and goal set of states. • Processing: • Activate a Search algorithm to find a path form start to one of the goal states. Ram Meshulam 2004
Uninformed Search • Uninformed search methods use only information available in the problem definition. • Breadth First Search (BFS) • Depth First Search (DFS) • Iterative DFS (IDA) • Bi-directional search • Uniform Cost Search (a.k.a. Dijkstra alg.) Ram Meshulam 2004
Breadth-First-Search Attributes • Completeness – yes • Optimality – yes, if graph is un-weighted. • Time Complexity: • Memory Complexity: • Where b is branching factor and d is the solution depth 4 Ram Meshulam 2004 Ram Meshulam 2004
1 Optimal s. 5 2 3 solution 4 Depth-First-Search Attributes • Completeness – No. Infinite loops or Infinite depth can occur. • Optimality – No. • Time Complexity: • Memory Complexity: • Where b is branching factor and m is the maximum depth of search tree Ram Meshulam 2004
Limited DFS Attributes • Completeness – Yes, if d≤l • Optimality – No. • Time Complexity: • If d<l, it is larger than in BFS • Memory Complexity: • Where b is branching factor and l is the depth limit. Ram Meshulam 2004
0 2,6,16 1,3,9 8,20 7,17 c 4,10 5,13 c 15 c 11 12 14 18 19 21 22 The numbers represent the order generated by DFID Depth-First Iterative-Deepening Ram Meshulam 2004
Iterative-Deepening Attributes • Completeness – Yes • Optimality – yes, if graph is un-weighted. • Time Complexity: • Memory Complexity: • Where b is branching factor and d is the maximum depth of search tree Ram Meshulam 2004
State Redundancies • Closed list - a hash table which holds the visited nodes. • For example BFS: Closed List Open List (Frontier) Ram Meshulam 2004
Uniform Cost Search Attributes • Completeness: yes, for positive weights • Optimality: yes • Time & Memory complexity: • Where b is branching factor, c is the optimal solution cost and e is the minimum edge cost Ram Meshulam 2004
Best First Search Algorithms • Principle: Expand node n with the best evaluation function value f(n). • Implement via a priority queue • Algorithms differ with definition of f : • Greedy Search: • A*: • IDA*: iterative deepening version of A* • Etc’ Ram Meshulam 2004
Best-FS Algorithm Pseudo code • Start with open = [initial-state]. • While open is not empty do • Pick the best node on open. • If it is the goal node then return with success. Otherwise find its successors. • Assign the successor nodes a score using the evaluation function and add the scored nodes to open Ram Meshulam 2004
General Framework using Closed-list (Graph-Search) • GraphSearch(Graph graph, Node start, Vector goals) • Omake_data_structure(start) // open list • Cmake_hash_table // closed list • While O not empty loop • n O.remove_front() • If goal (n) return n • If n is found on C continue • //otherwise • O successors (n) • Cn • Return null //no goal found Ram Meshulam 2004
s 1 3 b a 2 1 g Greedy Search Attributes • Completeness: No. Inaccurate heuristics can cause loops (unless using a closed list), or entering an infinite path • Optimality: No. Inaccurate heuristics can lead to a non optimal solution. • Time & Memory complexity: h=1 h=2 Ram Meshulam 2004
A* Algorithm (1) • Combines greedy h(n) and uniform cost g(n) approaches. • Evaluation function: f(n)=g(n)+h(n) • Completeness: • In a finite graph: Yes • In an infinite graph: if all edge costs are finite and have a minimum positive value, and all heuristic values are finite and non-negative. • Optimality: • In tree-search: if h(n) is admissible • In graph-search: if it is also consistent Ram Meshulam 2004
Heuristic Function h(n) • Admissible/Underestimate:h(n) never overestimate the actual cost from n to goal • Consistent/monotonic (desirable): h(m)-h(n) ≤w(n,m) where m is parent of n. This ensures f(n) ≥f(m). Ram Meshulam 2004
A* Algorithm (2) • optimally efficient: A* expands the minimal number of nodes possible with any given (consistent) heuristic. • Time and space complexity: • Worst case: Cost function f(n) = g(n) • Best case: Cost function f(n) = g(n) + h*(n) Ram Meshulam 2004
Duplicate Pruning • Do not enter the father of the current state • With or without using closed-list • Using a closed-list, check the closed list before entering new nodes to the open list • Note: in A*, h has to be consistent! • Do not remove the original check • Using a stack, check the current branch and stack status before entering new nodes Ram Meshulam 2004
IDA* Algorithm • Each iteration is a depth-first search that keeps track of the cost evaluation f = g + h of each node generated. • The cost threshold is initialized to the heuristic of the initial state. • If a node is generated whose cost exceeds the threshold for that iteration, its path is cut off. Ram Meshulam 2004
IDA* Attributes • The cost threshold increases in each iteration to the total cost of the lowest-cost node that was pruned during the previous iteration. • The algorithm terminates when a goal state is reached whose total cost does not exceed the current threshold. • Completeness and Optimality: Like A* • Space complexity: • Time complexity*: Ram Meshulam 2004
Local Search – Cont. • In order to avoid local maximum and plateaus we permit moves to states with lower values in probability p. • The different algorithms differ in p. Ram Meshulam 2004
Hill Climbing • Always choose the next best successor • Stop when no improvement possible • In order to avoid plateaus and local maximum: • Sideways move • Stochastic hill climbing • Random-restart algorithm Ram Meshulam 2004
Simulated Annealing – Pseudo code Cont. • Acceptor func. example: • Schedule func. example: Ram Meshulam 2004
Search Algorithms Hierarchy Ram Meshulam 2004
Exercise • What are the different data structures used to implement the open list in BFS,DFS,Best- FS: Ram Meshulam 2004
Minimax • Perfect play for deterministic games • Idea: choose move to position with highest minimax value = best achievable payoff against best play • E.g., 2-ply game: Ram Meshulam 2004
Properties of minimax • Complete? (=will not run forever) Yes (if tree is finite) • Optimal? (=will find the optimal response) Yes (against an optimal opponent) • Time complexity? O(bm) • Space complexity? O(bm) (depth-first exploration), O(bm) for saving the optimal response • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible Ram Meshulam 2004
α-β pruning example Ram Meshulam 2004
α-β pruning example Ram Meshulam 2004
α-β pruning example Ram Meshulam 2004
α-β pruning example Ram Meshulam 2004
α-β pruning example Ram Meshulam 2004
Planning • Traditional search methods does not fit to a large, real world problem • We want to use general knowledge • We need general heuristic • Problem decomposition Ram Meshulam 2004
STRIPS – Representation • States and goal – sentences in FOL. • Operators – are combined of 3 parts: • Operator name • Preconditions – a sentence describing the conditions that must occur so that the operator can be executed. • Effect – a sentence describing how the world has change as a result of executing the operator. Has 2 parts: • Add-list • Delete-list • Optionally, a set of (simple) variable constraints Ram Meshulam 2004
Choosing an attribute • Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" • Patrons? is a better choice Ram Meshulam 2004
Using information theory • To implement Choose-Attribute in the DTL algorithm • Information Content of an answer (Entropy): I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi) • For a training set containing p positive examples and n negative examples: Ram Meshulam 2004
Information gain • A chosen attribute A divides the training set E into subsets E1, … , Ev according to their values for A, where A has v distinct values. • Information Gain (IG) or reduction in entropy from the attribute test: • Choose the attribute with the largest IG Ram Meshulam 2004
Information gain For the training set, p = n = 6, I(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root Ram Meshulam 2004
Bayes’ Rule P(B|A) = P(A|B)*P(B) P(A)
Computing the denominator: #1 approach - compute relative likelihoods: • If M (meningitis) and W(whiplash) are two possible explanations #2 approach - Using M & ~M: • Checking the probability of M, ~M when S • P(M|S) = P(S| M) * P(M) / P(S) • P(~M|S) = P(S| ~M) * P(~M)/ P(S) • P(M|S) + P(~M | S) = 1 (must sum to 1)
Perceptrons • Linear separability • A set of (2D) patterns (x1, x2) of two classes is linearly separable if there exists a line on the (x1, x2) plane • w0 + w1x1 + w2 x2 = 0 • Separates all patterns of one class from the other class • A perceptron can be built with • 3 input x0 = 1, x1, x2 with weights w0, w1, w2 • n dimensional patterns (x1,…, xn) • Hyperplanew0 + w1x1 + w2 x2 +…+ wnxn = 0 dividing the space into two regions Ram Meshulam 2004
w13 w35 x5 x4 x3 x1 x2 w14 w23 w45 w24 Backpropagation example • Sigmoid as activation function with x=3: • g(in) = 1/(1+℮-3·in) • g’(in) = 3g(in)(1-g(in)) Ram Meshulam 2004
1 1 x0 x6 w03 w65 w04 w13 w35 x2 x5 x1 x3 x4 w14 w23 w45 w24 Adding the threshold Ram Meshulam 2004
Training Set • Logical XOR (exclusive OR)function x1 x2 output 0 0 0 0 1 1 1 0 1 1 1 0 • Choose random weights • <w03,w04,w13,w14,w23,w24,w65,w35,w45> = <0.03,0.04,0.13,0.14,-0.23,-0.24,0.65,0.35,0.45> • Learning rate: 0.1 for the hidden layers, 0.3 for the output layer Ram Meshulam 2004
First Example • Compute the outputs • a0 = 1 , a1= 0 , a2 = 0 • a3 = g(1*0.03 + 0*0.13 + 0*-0.23) = 0.522 • a4 = g(1*0.04 + 0*0.14 + 0*-0.24) = 0.530 • a6 = 1, a5 = g(0.65*1 + 0.35*0.522 + 0.45*0.530) = 0.961 • Calculate ∆5 = 3*g(1.0712)*(1-g(1.0712))*(0-0.961) = -0.108 • Calculate ∆6, ∆3, ∆4 • ∆6 = 3*g(1)*(1-g(1))*(0.65*-0.108) = -0.010 • ∆3 = 3*g(0.03)*(1-g(0.03))*(0.35*-0.108) = -0.028 • ∆4 = 3*g(0.04)*(1-g(0.04))*(0.45*-0.108) = -0.036 • Update weights for the output layer • w65 = 0.65 + 0.3*1*-0.108 = 0.618 • w35 = 0.35 + 0.3*0.522*-0.108 = 0.333 • w45 = 0.45 + 0.3*0.530*-0.108 = 0.433 Ram Meshulam 2004
First Example (cont) • Calculate ∆0, ∆1, ∆2 • ∆0 = 3*g(1)*(1-g(1))*(0.03*-0.028 + 0.04*-0.036) = -0.001 • ∆1 = 3*g(0)*(1-g(0))*(0.13*-0.028 + 0.14*-0.036) = -0.006 • ∆2 = 3*g(0)*(1-g(0))*(-0.23*-0.028 + -0.24*-0.036) = 0.011 • Update weights for the hidden layer • w03 = 0.03 + 0.1*1*-0.028 = 0.027 • w04 = 0.04 + 0.1*1*-0.036 = 0.036 • w13 = 0.13 + 0.1*0*-0.028 = 0.13 • w14 = 0.14 + 0.1*0*-0.036 = 0.14 • w23 = -0.23 + 0.1*0*-0.028 = -0.23 • w24 = -0.24 + 0.1*0*-0.036 = -0.24 Ram Meshulam 2004
Second Example • Compute the outputs • a0 = 1, a1= 0 , a2 = 1 • a3 = g(1*0.027 + 0*0.13 + 1*-0.23) = 0.352 • a4 = g(1*0.036 + 0*0.14 + 1*-0.24) = 0.352 • a6 = 1, a5 = g(0.618*1 + 0.333*0.352 + 0.433*0.352) = 0.935 • Calculate ∆1 = 3*g(0.888)*(1-g(0.888))*(1-0.935) = 0.012 • Calculate ∆6, ∆3, ∆4 • ∆6 = 3*g(1)*(1-g(1))*(0.618*0.012) = 0.001 • ∆3 = 3*g(-0.203)*(1-g(-0.203))*(0.333*0.012) = 0.003 • ∆4 = 3*g(-0.204)*(1-g(-0.204))*(0.433*0.012) = 0.004 • Update weights for the output layer • w65 = 0.618 + 0.3*1*0.012 = 0.623 • w35 = 0.333 + 0.3*0.352*0.012 = 0.334 • w45 = 0.433 + 0.3*0.352*0.012 = 0.434 Ram Meshulam 2004
Second Example (cont) • Calculate ∆0, ∆1, ∆2 • Skipped, we do not use them • Update weights for the hidden layer • w03 = 0.027 + 0.1*1*0.003 = 0.027 • w04 = 0.036 + 0.1*1*0.004 = 0.036 • w13 = 0.13 + 0.1*0*0.003 = 0.13 • w14 = 0.14 + 0.1*0*0.004 = 0.14 • w23 = -0.23 + 0.1*1*0.003 = -0.23 • w24 = -0.24 + 0.1*1*0.004 = -0.24 Ram Meshulam 2004
Bayesian networks • Syntax: • a set of nodes, one per variable • a directed, acyclic graph (link ≈ "directly influences") • a conditional distribution for each node given its parents: P (Xi | Parents (Xi))- conditional probability table (CPT) Ram Meshulam 2004
P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi)) full joint distribution table Calculation of Joint Probability • Given its parents, each node is conditionally independent of everything except its descendants • Thus, • Every BN over a domain implicitly represents some joint distribution over that domain Ram Meshulam 2004