1 / 59

Artificial Intelligence

Artificial Intelligence. Rehearsal Lesson. Solving Problems with Search Algorithms. Input: a problem P . Preprocessing: Define states and a state space Define Operators Define a start state and goal set of states. Processing:

xia
Download Presentation

Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Intelligence Rehearsal Lesson Ram Meshulam 2004

  2. Solving Problems with Search Algorithms • Input: a problem P. • Preprocessing: • Define states and a state space • Define Operators • Define a start state and goal set of states. • Processing: • Activate a Search algorithm to find a path form start to one of the goal states. Ram Meshulam 2004

  3. Uninformed Search • Uninformed search methods use only information available in the problem definition. • Breadth First Search (BFS) • Depth First Search (DFS) • Iterative DFS (IDA) • Bi-directional search • Uniform Cost Search (a.k.a. Dijkstra alg.) Ram Meshulam 2004

  4. Breadth-First-Search Attributes • Completeness – yes • Optimality – yes, if graph is un-weighted. • Time Complexity: • Memory Complexity: • Where b is branching factor and d is the solution depth 4 Ram Meshulam 2004 Ram Meshulam 2004

  5. 1 Optimal s. 5 2 3 solution 4 Depth-First-Search Attributes • Completeness – No. Infinite loops or Infinite depth can occur. • Optimality – No. • Time Complexity: • Memory Complexity: • Where b is branching factor and m is the maximum depth of search tree Ram Meshulam 2004

  6. Limited DFS Attributes • Completeness – Yes, if d≤l • Optimality – No. • Time Complexity: • If d<l, it is larger than in BFS • Memory Complexity: • Where b is branching factor and l is the depth limit. Ram Meshulam 2004

  7. 0 2,6,16 1,3,9 8,20 7,17 c 4,10 5,13 c 15 c 11 12 14 18 19 21 22 The numbers represent the order generated by DFID Depth-First Iterative-Deepening Ram Meshulam 2004

  8. Iterative-Deepening Attributes • Completeness – Yes • Optimality – yes, if graph is un-weighted. • Time Complexity: • Memory Complexity: • Where b is branching factor and d is the maximum depth of search tree Ram Meshulam 2004

  9. State Redundancies • Closed list - a hash table which holds the visited nodes. • For example BFS: Closed List Open List (Frontier) Ram Meshulam 2004

  10. Uniform Cost Search Attributes • Completeness: yes, for positive weights • Optimality: yes • Time & Memory complexity: • Where b is branching factor, c is the optimal solution cost and e is the minimum edge cost Ram Meshulam 2004

  11. Best First Search Algorithms • Principle: Expand node n with the best evaluation function value f(n). • Implement via a priority queue • Algorithms differ with definition of f : • Greedy Search: • A*: • IDA*: iterative deepening version of A* • Etc’ Ram Meshulam 2004

  12. Best-FS Algorithm Pseudo code • Start with open = [initial-state]. • While open is not empty do • Pick the best node on open. • If it is the goal node then return with success. Otherwise find its successors. • Assign the successor nodes a score using the evaluation function and add the scored nodes to open Ram Meshulam 2004

  13. General Framework using Closed-list (Graph-Search) • GraphSearch(Graph graph, Node start, Vector goals) • Omake_data_structure(start) // open list • Cmake_hash_table // closed list • While O not empty loop • n O.remove_front() • If goal (n) return n • If n is found on C  continue • //otherwise • O  successors (n) • Cn • Return null //no goal found Ram Meshulam 2004

  14. s 1 3 b a 2 1 g Greedy Search Attributes • Completeness: No. Inaccurate heuristics can cause loops (unless using a closed list), or entering an infinite path • Optimality: No. Inaccurate heuristics can lead to a non optimal solution. • Time & Memory complexity: h=1 h=2 Ram Meshulam 2004

  15. A* Algorithm (1) • Combines greedy h(n) and uniform cost g(n) approaches. • Evaluation function: f(n)=g(n)+h(n) • Completeness: • In a finite graph: Yes • In an infinite graph: if all edge costs are finite and have a minimum positive value, and all heuristic values are finite and non-negative. • Optimality: • In tree-search: if h(n) is admissible • In graph-search: if it is also consistent Ram Meshulam 2004

  16. Heuristic Function h(n) • Admissible/Underestimate:h(n) never overestimate the actual cost from n to goal • Consistent/monotonic (desirable): h(m)-h(n) ≤w(n,m) where m is parent of n. This ensures f(n) ≥f(m). Ram Meshulam 2004

  17. A* Algorithm (2) • optimally efficient: A* expands the minimal number of nodes possible with any given (consistent) heuristic. • Time and space complexity: • Worst case: Cost function f(n) = g(n) • Best case: Cost function f(n) = g(n) + h*(n) Ram Meshulam 2004

  18. Duplicate Pruning • Do not enter the father of the current state • With or without using closed-list • Using a closed-list, check the closed list before entering new nodes to the open list • Note: in A*, h has to be consistent! • Do not remove the original check • Using a stack, check the current branch and stack status before entering new nodes Ram Meshulam 2004

  19. IDA* Algorithm • Each iteration is a depth-first search that keeps track of the cost evaluation f = g + h of each node generated. • The cost threshold is initialized to the heuristic of the initial state. • If a node is generated whose cost exceeds the threshold for that iteration, its path is cut off. Ram Meshulam 2004

  20. IDA* Attributes • The cost threshold increases in each iteration to the total cost of the lowest-cost node that was pruned during the previous iteration. • The algorithm terminates when a goal state is reached whose total cost does not exceed the current threshold. • Completeness and Optimality: Like A* • Space complexity: • Time complexity*: Ram Meshulam 2004

  21. Local Search – Cont. • In order to avoid local maximum and plateaus we permit moves to states with lower values in probability p. • The different algorithms differ in p. Ram Meshulam 2004

  22. Hill Climbing • Always choose the next best successor • Stop when no improvement possible • In order to avoid plateaus and local maximum: • Sideways move • Stochastic hill climbing • Random-restart algorithm Ram Meshulam 2004

  23. Simulated Annealing – Pseudo code Cont. • Acceptor func. example: • Schedule func. example: Ram Meshulam 2004

  24. Search Algorithms Hierarchy Ram Meshulam 2004

  25. Exercise • What are the different data structures used to implement the open list in BFS,DFS,Best- FS: Ram Meshulam 2004

  26. Minimax • Perfect play for deterministic games • Idea: choose move to position with highest minimax value = best achievable payoff against best play • E.g., 2-ply game: Ram Meshulam 2004

  27. Properties of minimax • Complete? (=will not run forever) Yes (if tree is finite) • Optimal? (=will find the optimal response) Yes (against an optimal opponent) • Time complexity? O(bm) • Space complexity? O(bm) (depth-first exploration), O(bm) for saving the optimal response • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible Ram Meshulam 2004

  28. α-β pruning example Ram Meshulam 2004

  29. α-β pruning example Ram Meshulam 2004

  30. α-β pruning example Ram Meshulam 2004

  31. α-β pruning example Ram Meshulam 2004

  32. α-β pruning example Ram Meshulam 2004

  33. Planning • Traditional search methods does not fit to a large, real world problem • We want to use general knowledge • We need general heuristic • Problem decomposition Ram Meshulam 2004

  34. STRIPS – Representation • States and goal – sentences in FOL. • Operators – are combined of 3 parts: • Operator name • Preconditions – a sentence describing the conditions that must occur so that the operator can be executed. • Effect – a sentence describing how the world has change as a result of executing the operator. Has 2 parts: • Add-list • Delete-list • Optionally, a set of (simple) variable constraints Ram Meshulam 2004

  35. Choosing an attribute • Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" • Patrons? is a better choice Ram Meshulam 2004

  36. Using information theory • To implement Choose-Attribute in the DTL algorithm • Information Content of an answer (Entropy): I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi) • For a training set containing p positive examples and n negative examples: Ram Meshulam 2004

  37. Information gain • A chosen attribute A divides the training set E into subsets E1, … , Ev according to their values for A, where A has v distinct values. • Information Gain (IG) or reduction in entropy from the attribute test: • Choose the attribute with the largest IG Ram Meshulam 2004

  38. Information gain For the training set, p = n = 6, I(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root Ram Meshulam 2004

  39. Bayes’ Rule P(B|A) = P(A|B)*P(B) P(A)

  40. Computing the denominator: #1 approach - compute relative likelihoods: • If M (meningitis) and W(whiplash) are two possible explanations #2 approach - Using M & ~M: • Checking the probability of M, ~M when S • P(M|S) = P(S| M) * P(M) / P(S) • P(~M|S) = P(S| ~M) * P(~M)/ P(S) • P(M|S) + P(~M | S) = 1 (must sum to 1)

  41. Perceptrons • Linear separability • A set of (2D) patterns (x1, x2) of two classes is linearly separable if there exists a line on the (x1, x2) plane • w0 + w1x1 + w2 x2 = 0 • Separates all patterns of one class from the other class • A perceptron can be built with • 3 input x0 = 1, x1, x2 with weights w0, w1, w2 • n dimensional patterns (x1,…, xn) • Hyperplanew0 + w1x1 + w2 x2 +…+ wnxn = 0 dividing the space into two regions Ram Meshulam 2004

  42. w13 w35 x5 x4 x3 x1 x2 w14 w23 w45 w24 Backpropagation example • Sigmoid as activation function with x=3: • g(in) = 1/(1+℮-3·in) • g’(in) = 3g(in)(1-g(in)) Ram Meshulam 2004

  43. 1 1 x0 x6 w03 w65 w04 w13 w35 x2 x5 x1 x3 x4 w14 w23 w45 w24 Adding the threshold Ram Meshulam 2004

  44. Training Set • Logical XOR (exclusive OR)function x1 x2 output 0 0 0 0 1 1 1 0 1 1 1 0 • Choose random weights • <w03,w04,w13,w14,w23,w24,w65,w35,w45> = <0.03,0.04,0.13,0.14,-0.23,-0.24,0.65,0.35,0.45> • Learning rate: 0.1 for the hidden layers, 0.3 for the output layer Ram Meshulam 2004

  45. First Example • Compute the outputs • a0 = 1 , a1= 0 , a2 = 0 • a3 = g(1*0.03 + 0*0.13 + 0*-0.23) = 0.522 • a4 = g(1*0.04 + 0*0.14 + 0*-0.24) = 0.530 • a6 = 1, a5 = g(0.65*1 + 0.35*0.522 + 0.45*0.530) = 0.961 • Calculate ∆5 = 3*g(1.0712)*(1-g(1.0712))*(0-0.961) = -0.108 • Calculate ∆6, ∆3, ∆4 • ∆6 = 3*g(1)*(1-g(1))*(0.65*-0.108) = -0.010 • ∆3 = 3*g(0.03)*(1-g(0.03))*(0.35*-0.108) = -0.028 • ∆4 = 3*g(0.04)*(1-g(0.04))*(0.45*-0.108) = -0.036 • Update weights for the output layer • w65 = 0.65 + 0.3*1*-0.108 = 0.618 • w35 = 0.35 + 0.3*0.522*-0.108 = 0.333 • w45 = 0.45 + 0.3*0.530*-0.108 = 0.433 Ram Meshulam 2004

  46. First Example (cont) • Calculate ∆0, ∆1, ∆2 • ∆0 = 3*g(1)*(1-g(1))*(0.03*-0.028 + 0.04*-0.036) = -0.001 • ∆1 = 3*g(0)*(1-g(0))*(0.13*-0.028 + 0.14*-0.036) = -0.006 • ∆2 = 3*g(0)*(1-g(0))*(-0.23*-0.028 + -0.24*-0.036) = 0.011 • Update weights for the hidden layer • w03 = 0.03 + 0.1*1*-0.028 = 0.027 • w04 = 0.04 + 0.1*1*-0.036 = 0.036 • w13 = 0.13 + 0.1*0*-0.028 = 0.13 • w14 = 0.14 + 0.1*0*-0.036 = 0.14 • w23 = -0.23 + 0.1*0*-0.028 = -0.23 • w24 = -0.24 + 0.1*0*-0.036 = -0.24 Ram Meshulam 2004

  47. Second Example • Compute the outputs • a0 = 1, a1= 0 , a2 = 1 • a3 = g(1*0.027 + 0*0.13 + 1*-0.23) = 0.352 • a4 = g(1*0.036 + 0*0.14 + 1*-0.24) = 0.352 • a6 = 1, a5 = g(0.618*1 + 0.333*0.352 + 0.433*0.352) = 0.935 • Calculate ∆1 = 3*g(0.888)*(1-g(0.888))*(1-0.935) = 0.012 • Calculate ∆6, ∆3, ∆4 • ∆6 = 3*g(1)*(1-g(1))*(0.618*0.012) = 0.001 • ∆3 = 3*g(-0.203)*(1-g(-0.203))*(0.333*0.012) = 0.003 • ∆4 = 3*g(-0.204)*(1-g(-0.204))*(0.433*0.012) = 0.004 • Update weights for the output layer • w65 = 0.618 + 0.3*1*0.012 = 0.623 • w35 = 0.333 + 0.3*0.352*0.012 = 0.334 • w45 = 0.433 + 0.3*0.352*0.012 = 0.434 Ram Meshulam 2004

  48. Second Example (cont) • Calculate ∆0, ∆1, ∆2 • Skipped, we do not use them • Update weights for the hidden layer • w03 = 0.027 + 0.1*1*0.003 = 0.027 • w04 = 0.036 + 0.1*1*0.004 = 0.036 • w13 = 0.13 + 0.1*0*0.003 = 0.13 • w14 = 0.14 + 0.1*0*0.004 = 0.14 • w23 = -0.23 + 0.1*1*0.003 = -0.23 • w24 = -0.24 + 0.1*1*0.004 = -0.24 Ram Meshulam 2004

  49. Bayesian networks • Syntax: • a set of nodes, one per variable • a directed, acyclic graph (link ≈ "directly influences") • a conditional distribution for each node given its parents: P (Xi | Parents (Xi))- conditional probability table (CPT) Ram Meshulam 2004

  50. P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi))  full joint distribution table Calculation of Joint Probability • Given its parents, each node is conditionally independent of everything except its descendants • Thus, • Every BN over a domain implicitly represents some joint distribution over that domain Ram Meshulam 2004

More Related