330 likes | 484 Views
Notes for CS3310 Artificial Intelligence Part 10: Learning and conclusions. Prof. Neil C. Rowe Naval Postgraduate School Version of January 2006. Improving artificial-intelligence programs. The ability to learn is essential to intelligence.
E N D
Notes for CS3310Artificial IntelligencePart 10: Learning and conclusions Prof. Neil C. Rowe Naval Postgraduate School Version of January 2006
Improving artificial-intelligence programs The ability to learn is essential to intelligence. Learning by computers (or “machine learning”) is big area of research incorporating many different methods. Two approaches are used: • Supervised learning, where the computer is given “training examples” and told what to conclude (or not to conclude) about them. It then extrapolates or generalizes from the examples to handle new examples. It may require work to create the training examples. • Unsupervised learning, where the computer is just given some data and asked to discover interesting things in it.
Supervised learning method 1: caching A simple learning method is to cache situations and their associated conclusions (as in case-based reasoning). Then when you see a new case with most of the same values, assume the other values are the same too. Advantage: Versatile. Disadvantages: Has big storage needs; has difficult indexing problems; has uncomfortably subjective problem of measuring similarity; doesn’t generalize from experience.
Supervised learning method 2: decision tree extension Given a decision tree and a new known case, follow path in the decision tree for the new case and modify the leaf if the conclusion is incorrect. Advantage: Simple to use. Disadvantages: Can’t handle variables; unbalanced trees. Suppose we are told that when a, c, and d are false, we should conclude s. Extend the rightmost node to ask about d; if false, conclude s, else stop. a? yes no c? b? yes no no yes d? stop conclude r conclude s no yes stop conclude r
Example for building a decision graph Change the graph as little as possible so as to accommodate the given cases in succession. Facts not mentioned should be assumed false. Case 1: Given b and c, conclude a1. Case 2: Given b and d, conclude a2. Case 3: Given b and c and e, conclude a1. Case 4: Given b and e, conclude a1. Case 5: Given b and c and d and f, conclude a1 and a2.
Supervised learning method 3: Discrete concept learning Given a sequence of examples and nonexamples of some concept (“cases”) in an order suggested by a teacher. Create a starting rule whose “if” part is a conjunction of the conditions in the first example. Then modify the rule as necessary to handle each new case correctly. Prefer the minimum change to the rules that can still explain the new case. Typical changes: Add a term, add a negated term, delete a term, or generalize a term. Advantages: Generalizes well from data. Disadvantages: Cannot handle errors or uncertainty, and sequence order can be important.
Example of learning of rules Change the rules as little as possible so as to accommodate the given cases in succession. Facts not mentioned should be assumed false. Case 1: Given b and c, conclude a1. Case 2: Given b and d, conclude a2. Case 3: Given b and c and e, conclude a1. Case 4: Given b and e, conclude a1. Case 5: Given b and c and d and f, conclude a1 and a2. Case 6: Given b and d and h, conclude nothing. Case 7: Given e and f and g, conclude a2. Case 8: Given f and g, conclude a2. Case 9: Given f, conclude nothing.
An algorithm for learning a set of boolean rules • Given a set of “cases” (examples and nonexamples) of a concept c (described by a list of facts true for them). • Create a rule where c is implied by the conjunction of all the facts true for the first example. • For each subsequent case taken in order: • If it’s an example of c, then if no rule concludes c with these facts, pick the most-similar rule and remove terms, generalize terms, or widen the range of numeric terms, until it does. If no sufficiently similar rule, create a new rule for c with the conjunction of the facts in this example. • If it’s a nonexample of c, then for every rule that concludes c for this case and every conjunct p in that rule, add a conjunct of not(p), and combine similar terms if you can. This results in a a set of rules with only conjuncts and negations; disjuncts are accomplished by multiple rules.
Example concept-learning situations • Given rule: “p is true if q and r are true”. If we see a case of p where q and s are true but r is false, we should change the rule to “p is true if q” if r and s are unrelated (or “p is true if q and (r or s)” otherwise). • Given rule: “p is true if q and r are true”. If we see a case of not(p) where q and r are true but s is false, we should change the rule to “p is true if q and r and not(s)”. • Given two rules: “p is true if q and r are true” and “p is true if q and s are true”. If we see a case of not(p) where s is true but q and r are false, we should modify the second rule to “p is true if s is true and r is false” since the second rule is more similar to the case in its conditions. (Check only positive conditions.)
e(apple,[red,spherical,shiny, fragrant]). e(apple,[red,spherical,shiny]). e(apple,[green,spherical,shiny]). e(apple,[red,spherical,heavy]). e(apple,[green,spherical, fragrant]). e(apple,[yellow,ovoid,fragrant]). e(ball,[blue,spherical,shiny]). e(ball,[green,spherical,heavy]). e(ball,[black,spherical]). e(ball,[green,spherical,shiny]). e(lemon,[yellow,spherical, fragrant]). e(lemon,[yellow,ovoid,heavy, fragrant]). e(lemon,[yellow,ovoid, fragrant]). e(light,[red,spherical,glowing]). e(light,[white,spherical,glowing, heavy]). e(light,[white,cubic,glowing]). e(box,[red,cubic,heavy,shiny, fragrant]). e(box,[white,cubic]). e(box,[blue,paralleliped]). e(box,[yellow,paralleliped, heavy]). Input for a Java multiple-concept learning program
Successive theories found for that input Conclude apple if red, spherical, shiny, fragrant, ~green, ~heavy, ~yellow, ~ovoid, ~blue, ~black, ~glowing, ~white, ~cubic, and ~paralleliped Conclude apple if red, spherical, shiny, ~green, ~heavy, ~yellow, ~ovoid, ~blue, ~black, ~glowing, ~white, ~cubic, and ~paralleliped Conclude apple if spherical, shiny, ~heavy, ~yellow, ~ovoid, ~blue, ~black, ~glowing, ~white, ~cubic, and ~paralleliped
Final theory set after all 15 examples • apple: spherical ~yellow ~ovoid ~blue ~black ~glowing ~white ~cubic ~paralleliped • apple: yellow ovoid fragrant ~red ~spherical ~shiny ~green ~heavy ~blue ~black ~glowing ~white ~cubic ~paralleliped (3) ball: spherical ~red ~fragrant ~yellow ~ovoid ~white ~cubic ~glowing ~paralleliped (4) lemon: yellow fragrant ~red ~shiny ~green ~blue ~black ~glowing ~white ~paralleliped (5) light: glowing ~shiny ~gragrant ~green ~yellow ~ovoid ~blue ~black ~paralelliped (6) box: red cubic heavy shiny fragrant ~spherical ~green ~yellow ~ovoid ~blue ~black ~glowing ~white paralleliped (7) box: white cubic ~red ~spherical ~shiny ~fragrant ~green ~heavy ~blue ~black ~glowing ~paralleliped (8) box: paralleliped ~red ~spherical ~shiny ~fragrant ~green ~ovoid ~black ~glowing ~white ~cubic
Supervised learning method 4: Analogy Find previous case with similarities to a new situation, define a logical mapping onto a new case. Advantage: An intelligent kind of learning, so can learn things the other methods can't. Disadvantage: Slow and hard to implement . Example: In firefighting, "desmoke" is like "dewater" except with water instead of smoke, so infer specifications for it: Dewater requires water present and no fire; desmoke requires smoke present and no fire. Dewater removes water entirely; desmoke removes smoke entirely. Dewater requires a water pump and a drain; desmoke requires a fan and a vent. Dewater duration depends on amount of water present and plumbing skill of agent dewatering; Desmoke duration depends on amount of smoke present and electrical skill of agent desmoking.
Supervised learning method 5: Keep statistics for Bayesian conditional probabilities For a training set, keep counts on how often a clue was associated with a given conclusion. Use these to construct a Naïve Bayes multiplication of factors. Advantages: Easy to do. Disadvantages: Requires a big training set, and doesn’t generalize from experience.
Some word clues for online captions and their measured statistics in a training set
Supervised learning method 6: Adjust weights in neural networks The backpropagation algorithm is the most common way to do this. It’s a form of nonlinear optimization and can exploit methods from operations research. Advantages: Can learn in noisy environments without experts; can handle unexpected situations; little storage required. Disadvantages: May not find the best solution, and doesn’t generalize much from experience.
Supervised learning method 7: Support vector machines • These are very popular today. They apply to perceptrons, where output is a linear weighted sum of the inputs. • Support vector “machines” are programs that compute the optimal formulas (actually, surfaces in hyperspace) to choose the best classification for a case. • They use methods of mathematical optimization. • Advantages: They can handle large amounts of data well. • Disadvantages: Perceptrons cannot model complex situations.
squares circles Support vector machines try to draw a boundary between two populations of cases
Supervised learning method 8: Set covering • First try to find a set of conjunctive rules that have the high “precision” or few false positives. • Then find the set of such rules for the same conclusion that has the highest mean of precision and recall (coverage). It’s common in data mining to use the “F-score” or harmonic mean. • Precision = # true positives / (# true positives + # false positives) • F-Score = # true positives / (# true positives + 0.5*(# false positives + # false negatives)) • It’s useful to set a minimum precision and minimum F-score. Higher values: faster results but more errors.
e(apple,[red,spherical,shiny, fragrant]). e(apple,[red,spherical,shiny]). e(apple,[green,spherical,shiny]). e(apple,[red,spherical,heavy]). e(apple,[green,spherical, fragrant]). e(apple,[yellow,ovoid,fragrant]). e(ball,[blue,spherical,shiny]). e(ball,[green,spherical,heavy]). e(ball,[black,spherical]). e(ball,[green,spherical,shiny]). e(lemon,[yellow,spherical, fragrant]). e(lemon,[yellow,ovoid,heavy, fragrant]). e(lemon,[yellow,ovoid, fragrant]). e(light,[red,spherical,glowing]). e(light,[white,spherical,glowing, heavy]). e(light,[white,cubic,glowing]). e(box,[red,cubic,heavy,shiny, fragrant]). e(box,[white,cubic]). e(box,[blue,paralleliped]). e(box,[yellow,paralleliped, heavy]). (Earlier) test data set
Example conjunctive rules found for “box” • box if heavy: precision 1.0 • box if heavy: precision 0.333 • box if blue: precision 0.500 • box if white: precision 0.333 • box if cubic: precision 0.667 • box is paralleliped: precision 1.000 • box if not spherical: precision 0.500 • box, if heavy and not spherical: precision 0.667 • box if cubic and heavy: precision 1.0
Example rule sets for “box” and their F-scores • box if cubic and heavy: F=0.400 • box if paralleliped: F=0.667 • box if heavy and not spherical: F=0.571 • box if (heavy and not spherical) or paralleliped: F=0.750 • box if cubic or (heavy and not spherical): F=0.667
Unsupervised learning method 1: Anomaly discovery Try to find something surprising in your data. "Surprise" means something that deviates from a "model". The surprise is usually statistical, like unexpectedly large counts for some pattern of conditions. For instance, when unexpected high occurrences of a disease occur near nuclear power plants among patients who took aspirin. Advantages: There’s lots of data around. Disadvantages: Rarely discovers anything.
Unsupervised learning method 2: Genetic algorithms Genetic algorithms can be used to search for new ideas in a heuristic search. They can try mutating and crossing existing ideas to create new ideas. They need a metric of “goodness” to measure success. Famous early example: the AM (Automated Mathematician) program Advantages: Computers are getting faster all the time, and it might be worth turning them loose to see what they discover. Disadvantages: Very slow, and rarely discovers anything.
Unsupervised learning method 3: Clustering Automatically infer classes ("cluster") by grouping similar items together. This improves the value of case-based reasoning. One method (“K-Means”): Pick initial cluster centers at random. Assign each case to the nearest cluster center. Recalculate cluster centers using assigned cases. Iterate until no changes. Advantages: Easy to do. Disadvantages: You need a good “distance” calculation, and clusters may not represent anything important in the real world. (Self-organizing maps are a good way to visualize clusters.)
Currently important applications of AI • Intelligent game and simulation players • Intelligent "agents" for Web information-finding • Automated "help desks” • Smarter user interfaces for software • Software specification and analysis tools • Self-diagnosing machines like aircraft • Intelligent robots that plan and analyze the world • Sensor networks for monitoring the world • Nanotechnology • Speech recognition • Language translation • Tutors for complex processes • Scheduling and allocation problems
The form of future AI applications 1. AI module embedded in a big program (C++ or Java): --enemy behavior in battlefield simulators --command information centers that combine threat data --path following of robot reconnaissance vehicle 2. Stand-alone AI systems on a PC. Examples: --automatic scheduling for 10 people using A* search --a tutor for firefighting using means-ends analysis --an intelligent inventory manager --consistency checker for software engineering 3. Big AI system that calls C++ or Java modules: --speech recognition for aircraft cockpits --identification of objects in aerial photographs --automatic scheduling for 100 people using search
Final thoughts • No fundamental obstacles to intelligent behavior by computers are apparent (though practical ones). • But many details need to be worked out – maybe 50 years of work is needed. • Speed increases in computers are making possible many AI methods that were previously impractical. • Not every aspect of human activity should be or needs to be modeled, however (e.g. emotions). • Humans still should be “in the loop” on important decisions: Total automation is not usually desirable. • Be cautious with techniques that are hard to evaluate like neural networks, fuzzy sets, and genetic algorithms.
Review question #1 You must schedule two 50-minute meetings, A and B. 800, 900, 1000, and 1100 are available. A must be scheduled first, and A's meeting time must be before B's meeting time. The only operator is schedule(X,T), where X is a meeting name and T is a time. Pretend there is no goal state. (a) Draw the complete search graph. (b) Give the order in which the states will be visited by depth-first search with the heuristic "Prefer the state whose last-scheduled meeting is later.” (c) Give the order in which the states will be visited by breadth-first search with that heuristic. Assume a queue is used and the heuristic controls the order in which successors are added to the queue.
Review question #2 Suppose there is one operator schedule(C,T,S) where C is the name of the class, T is the time of day, and S is a list of students signed up for it. Assume classes meet everyday at the same time, and there is only one section of a class. Assume there are facts that say what classes each student wants to take. (a) Give preconditions of this operator. (b) Describe a data structure adequate to describe states for this search problem. (c) Is bidirectional search good for this problem?
Review question #3, on successor functions What successors do these specifications find for the state [l(t1, d1), l(t2, s), o(d1, p33), o(d3, p77), c(t1, p33)]? (Here l=location, o=ordered, c=carrying, d=delivered, t1=truck1, t2=truck2, d1=depot1, d3=depot3, s=supply, p33=pallet33, and p77=pallet77). Assume the rules are tried in the order given with usual Prolog backtracking among choices. -- A truck can go either to Supply or to some place to which something it is carrying is ordered. -- If a truck is at Supply, load L is ordered, and load L is not already carried, the truck can carry load L. -- If a truck is carrying load L, and is where L is ordered, it can unload L there; this erases the order and the fact of carrying it, and adds a delivery fact.
Review question #4, on means-ends analysis Given these operators for an elevator: • call(F): push the button on floor F outside the elevator to make it come to you; recommended for [destination(elevator,F)]. • forward(D): move forward through the elevator entrance (D=in or D=out); recommended for [elevator_status(you,D)]. • pushfloor(F): push the button in the elevator for floor F; recommended for [floor(you,F)]. • wait: wait until the elevator doors open; recommended for [open(doors)]. Define preconditions and postconditions so that means-ends analysis will work. Use only these facts: • floor(X,F): X is at floor F, where X=you or X=elevator • elevator_status(you,S): S=in or S=out for you and the elevator • destination(elevator,F):the elevator is trying to go to floor F • open(doors): the elevator doors are open
Review question #5: Use concept learning to learn a rule with “and”s and “or”s • Example of terrorist: Middle Eastern, carrying a gun, nervous, heavy • Example of terrorist: European, carrying a gun, nervous, heavy • Nonexample of terrorist: Middle Eastern, not carrying a gun, nervous, heavy • Nonexample of terrorist: European, carrying a gun, nervous, heavy, police • Example of terrorist: American, carrying a gun, nervous, light • Example of terrorist: Middle Eastern, not carrying a gun, carrying a bomb, nervous, heavy