330 likes | 560 Views
Improving artificial-intelligence programs. The ability to learn is essential to intelligence.Learning by computers (or
E N D
1. Notes for CS3310Artificial IntelligencePart 10: Learning and conclusions Prof. Neil C. Rowe
Naval Postgraduate School
Version of January 2006
2. Improving artificial-intelligence programs The ability to learn is essential to intelligence.
Learning by computers (or “machine learning”) is big area of research incorporating many different methods.
Two approaches are used:
Supervised learning, where the computer is given “training examples” and told what to conclude (or not to conclude) about them. It then extrapolates or generalizes from the examples to handle new examples. It may require work to create the training examples.
Unsupervised learning, where the computer is just given some data and asked to discover interesting things in it.
3. Supervised learning method 1: caching A simple learning method is to cache situations and their associated conclusions (as in case-based reasoning).
Then when you see a new case with most of the same values, assume the other values are the same too.
Advantage: Versatile.
Disadvantages: Has big storage needs; has difficult indexing problems; has uncomfortably subjective problem of measuring similarity; doesn’t generalize from experience.
4. Supervised learning method 2: decision tree extension Given a decision tree and a new known case, follow path in the decision tree for the new case and modify the leaf if the conclusion is incorrect.
Advantage: Simple to use.
Disadvantages: Can’t handle variables; unbalanced trees.
Suppose we are told that when a, c, and d are false, we should conclude s. Extend the rightmost node to ask about d; if false, conclude s, else stop.
5. Supervised learning method 3: Discrete concept learning Given a sequence of examples and nonexamples of some concept (“cases”) in an order suggested by a teacher. Create a starting rule whose “if” part is a conjunction of the conditions in the first example. Then modify the rule as necessary to handle each new case correctly. Prefer the minimum change to the rules that can still explain the new case. Typical changes: Add a term, add a negated term, delete a term, or generalize a term.
Advantages: Generalizes well from data.
Disadvantages: Cannot handle errors or uncertainty, and sequence order can be important.
6. Example of learning of rules Change the rules as little as possible so as to accommodate the given cases in succession.
Facts not mentioned should be assumed false.
Case 1: Given b and c, conclude a1.
Case 2: Given b and d, conclude a2.
Case 3: Given b and c and e, conclude a1.
Case 4: Given b and e, conclude a1.
Case 5: Given b and c and d and f, conclude a1 and a2.
Case 6: Given b and d and h, conclude nothing.
Case 7: Given e and f and g, conclude a2.
Case 8: Given f and g, conclude a2.
Case 9: Given f, conclude nothing.
7. An algorithm for learning conjunctive rules Given a set of “cases” (examples and nonexamples) of a concept c (described by a list of facts true for them).
First write a rule where c is implied by the conjunction of all the facts true for the first example case.
For each subsequent case taken in order:
If it’s an example of c, then if no rule concludes c with these facts, pick the most-similar rule and remove terms, generalize terms, or widen the range of numeric terms, until it does. If no sufficiently similar rule, create a new rule for c with the conjunction of the facts in this example.
If it’s a nonexample of c, then for every rule that concludes c for this case, add a conjunct which is a negation of some fact present in the nonexample (generalizing to ranges for numeric terms if you can).
8. Example concept-learning situations Given rule: “p is true if q and r are true”. If we see a case of p where q and s are true but r is false, we should change the rule to “p is true if q” if r and s are unrelated (or “p is true if q and (r or s)” otherwise).
Given rule: “p is true if q and r are true”. If we see a case of not(p) where q and r are true but s is false, we should change the rule to “p is true if q and r and not(s)”.
Given two rules: “p is true if q and r are true” and “p is true if q and s are true”. If we see a case of not(p) where s is true but q and r are false, we should modify the second rule to “p is true if s is true and r is false” since the second rule is more similar to the case in its conditions. (Check only positive conditions.)
9. Input for a Java multiple-concept learning program e(apple,[red,spherical,shiny, fragrant]).
e(apple,[red,spherical,shiny]).
e(apple,[green,spherical,shiny]).
e(apple,[red,spherical,heavy]).
e(apple,[green,spherical, fragrant]).
e(apple,[yellow,ovoid,fragrant]).
e(ball,[blue,spherical,shiny]).
e(ball,[green,spherical,heavy]).
e(ball,[black,spherical]).
e(ball,[green,spherical,shiny]).
e(lemon,[yellow,spherical, fragrant]). e(lemon,[yellow,ovoid,heavy, fragrant]).
e(lemon,[yellow,ovoid, fragrant]).
e(light,[red,spherical,glowing]).
e(light,[white,spherical,glowing, heavy]).
e(light,[white,cubic,glowing]).
e(box,[red,cubic,heavy,shiny, fragrant]).
e(box,[white,cubic]).
e(box,[blue,paralleliped]).
e(box,[yellow,paralleliped, heavy]).
10. Successive theories found for that input Conclude apple if red, spherical, shiny, fragrant, ~green, ~heavy, ~yellow, ~ovoid, ~blue, ~black, ~glowing, ~white, ~cubic, and ~paralleliped
Conclude apple if red, spherical, shiny, ~green, ~heavy, ~yellow, ~ovoid, ~blue, ~black, ~glowing, ~white, ~cubic, and ~paralleliped
Conclude apple if spherical, shiny, ~heavy, ~yellow, ~ovoid, ~blue, ~black, ~glowing, ~white, ~cubic, and ~paralleliped
11. Final theory set after all 15 examples apple: spherical ~yellow ~ovoid ~blue ~black ~glowing ~white ~cubic ~paralleliped
apple: yellow ovoid fragrant ~red ~spherical ~shiny ~green ~heavy ~blue ~black ~glowing ~white ~cubic ~paralleliped
(3) ball: spherical ~red ~fragrant ~yellow ~ovoid ~white ~cubic ~glowing ~paralleliped
(4) lemon: yellow fragrant ~red ~shiny ~green ~blue ~black ~glowing ~white ~paralleliped
(5) light: glowing ~shiny ~gragrant ~green ~yellow ~ovoid ~blue ~black ~paralelliped
(6) box: red cubic heavy shiny fragrant ~spherical ~green ~yellow ~ovoid ~blue ~black ~glowing ~white paralleliped
(7) box: white cubic ~red ~spherical ~shiny ~fragrant ~green ~heavy ~blue ~black ~glowing ~paralleliped
(8) box: paralleliped ~red ~spherical ~shiny ~fragrant ~green ~ovoid ~black ~glowing ~white ~cubic
12. Supervised learning method 4: Analogy Find previous case with similarities to a new situation, define a logical mapping onto a new case.
Advantage: An intelligent kind of learning, so can learn things the other methods can't.
Disadvantage: Slow and hard to implement .
Example: In firefighting, "desmoke" is like "dewater" except with water instead of smoke, so infer specifications for it:
Dewater requires water present and no fire; desmoke requires smoke present and no fire.
Dewater removes water entirely; desmoke removes smoke entirely.
Dewater requires a water pump and a drain; desmoke requires a fan and a vent.
Dewater duration depends on amount of water present and plumbing skill of agent dewatering;
Desmoke duration depends on amount of smoke present and electrical skill of agent desmoking.
13. Supervised learning method 5: Keep statistics for Bayesian conditional probabilities For a training set, keep counts on how often a clue was associated with a given conclusion.
Use these to construct a Naďve Bayes multiplication of factors.
Advantages: Easy to do.
Disadvantages: Requires a big training set, and doesn’t generalize from experience.
14. Some word clues for online captions and their measured statistics in a training set
15. Supervised learning method 6: Adjust weights in neural networks The backpropagation algorithm is the most common way to do this. It’s a form of nonlinear optimization and can exploit methods from operations research.
Advantages: Can learn in noisy environments without experts; can handle unexpected situations; little storage required.
Disadvantages: May not find the best solution, and doesn’t generalize much from experience.
16. Supervised learning method 7: Support vector machines These are very popular today. They apply to perceptrons, where output is a linear weighted sum of the inputs.
Support vector “machines” are programs that compute the optimal formulas (actually, surfaces in hyperspace) to choose the best classification for a case.
They use methods of mathematical optimization.
Advantages: They can handle large amounts of data well.
Disadvantages: Perceptrons cannot model complex situations.
17. Support vector machines try to draw a boundary between two populations of cases
18. Supervised learning method 8: Set covering First try to find a set of conjunctive rules that have the high “precision” or few false positives.
Then find the set of such rules for the same conclusion that has the highest mean of precision and recall (coverage). It’s common in data mining to use the “F-score” or harmonic mean.
Precision = # true positives / (# true positives + # false positives)
F-Score = # true positives / (# true positives +
0.5*(# false positives + # false negatives))
It’s useful to set a minimum precision and minimum F-score. Higher values: faster results but more errors.
19. (Earlier) test data set e(apple,[red,spherical,shiny, fragrant]).
e(apple,[red,spherical,shiny]).
e(apple,[green,spherical,shiny]).
e(apple,[red,spherical,heavy]).
e(apple,[green,spherical, fragrant]).
e(apple,[yellow,ovoid,fragrant]).
e(ball,[blue,spherical,shiny]).
e(ball,[green,spherical,heavy]).
e(ball,[black,spherical]).
e(ball,[green,spherical,shiny]).
e(lemon,[yellow,spherical, fragrant]). e(lemon,[yellow,ovoid,heavy, fragrant]).
e(lemon,[yellow,ovoid, fragrant]).
e(light,[red,spherical,glowing]).
e(light,[white,spherical,glowing, heavy]).
e(light,[white,cubic,glowing]).
e(box,[red,cubic,heavy,shiny, fragrant]).
e(box,[white,cubic]).
e(box,[blue,paralleliped]).
e(box,[yellow,paralleliped, heavy]).
20. Rules learned from a set-covering program Conclude apple if red, not cubic, and not glowing.
Conclude apple if spherical, not black, not blue, not glowing, not heavy, and not yellow.
Conclude ball if black.
Conclude ball if blue, spherical, and not fragrant.
Conclude ball if green, spherical, and not fragrant.
Conclude lemon if fragrant and yellow.
Conclude light if glowing.
Conclude box if cubic.
Conclude box if paralleliped.
F-scores of above rule sets: apple .833, ball .889, lemon .857, light 1.000, box .889
21. Example conjunctive rules found for “box” box, if heavy and not(spherical): precision 0.667
box if cubic and heavy: precision 1.0
box if heavy: precision 1.0
box if heavy: precision 0.333
box if blue: precision 0.500
box if white: precision 0.333
box if cubic: precision 0.667
box is paralleliped: precision 1.000
box if not spherical: precision 0.500
22. Example rule sets for “box” and their F-scores box if cubic and heavy: F=0.400
box if paralleliped: F=0.667
box if heavy and not spherical: F=0.571
box if (heavy and not spherical) or paralleliped: F=0.750
box if cubic or (heavy and not spherical): F=0.667
23. Unsupervised learning method 1: Anomaly discovery Try to find something surprising in your data. "Surprise" means something that deviates from a "model".
The surprise is usually statistical, like unexpectedly large counts for some pattern of conditions. For instance, when unexpected high occurrences of a disease occur near nuclear power plants among patients who took aspirin.
Advantages: There’s lots of data around.
Disadvantages: Rarely discovers anything.
24. Unsupervised learning method 2: Genetic algorithms Genetic algorithms can be used to search for new ideas in a heuristic search.
They can try mutating and crossing existing ideas to create new ideas. They need a metric of “goodness” to measure success.
Famous early example: the AM (Automated Mathematician) program
Advantages: Computers are getting faster all the time, and it might be worth turning them loose to see what they discover.
Disadvantages: Very slow, and rarely discovers anything.
25. Unsupervised learning method 3: Clustering Automatically infer classes ("cluster") by grouping similar items together. This improves the value of case-based reasoning.
One method (“K-Means”): Pick initial cluster centers at random. Assign each case to the nearest cluster center. Recalculate cluster centers using assigned cases. Iterate until no changes.
Advantages: Easy to do.
Disadvantages: You need a good “distance” calculation, and clusters may not represent anything important in the real world.
26. Currently important applications of AI Intelligent game and simulation players
Intelligent "agents" for Web information-finding
Automated "help desks”
Smarter user interfaces for software
Software specification and analysis tools
Self-diagnosing machines like aircraft
Intelligent robots that plan and analyze the world
Sensor networks for monitoring the world
Nanotechnology
Speech recognition
Language translation
Tutors for complex processes
Scheduling and allocation problems
27. The form of future AI applications 1. AI module embedded in a big program (C++ or Java):
--enemy behavior in battlefield simulators
--command information centers that combine threat data
--path following of robot reconnaissance vehicle
2. Stand-alone AI systems on a PC. Examples:
--automatic scheduling for 10 people using A* search
--a tutor for firefighting using means-ends analysis
--an intelligent inventory manager
--consistency checker for software engineering
3. Big AI system that calls C++ or Java modules:
--speech recognition for aircraft cockpits
--identification of objects in aerial photographs
--automatic scheduling for 100 people using search
28. Final thoughts No fundamental obstacles to intelligent behavior by computers are apparent (though practical ones).
But many details need to be worked out – maybe 50 years of work is needed.
Speed increases in computers are making possible many AI methods that were previously impractical.
Not every aspect of human activity should or needs to be modeled, however (e.g. emotions).
Humans still should be “in the loop” on important decisions: Total automation is not usually desirable.
Be cautious with techniques that are hard to evaluate like neural networks, fuzzy sets, and genetic algorithms.
29. Review question #1 You must schedule two 50-minute meetings, A and B. 800, 900, 1000, and 1100 are available. A must be scheduled first, and A's meeting time must be before B's meeting time. The only operator is schedule(X,T), where X is a meeting name and T is a time. Pretend there is no goal state.
(a) Draw the complete search graph.
(b) Give the order in which the states will be visited by depth-first search with the heuristic "Prefer the state whose last-scheduled meeting is later.”
(c) Give the order in which the states will be visited by breadth-first search with that heuristic. Assume a queue is used and the heuristic controls the order in which successors are added to the queue.
30. Review question #2 Suppose there is one operator schedule(C,T,S) where C is the name of the class, T is the time of day, and S is a list of students signed up for it. Assume classes meet everyday at the same time, and there is only one section of a class. Assume there are facts that say what classes each student wants to take.
(a) Give preconditions of this operator.
(b) Describe a data structure adequate to describe states for this search problem.
(c) Is bidirectional search good for this problem?
31. Review question #3, on successor functions What successors do these specifications find for the state [l(t1, d1), l(t2, s), o(d1, p33), o(d3, p77), c(t1, p33)]? (Here l=location, o=ordered, c=carrying, d=delivered, t1=truck1, t2=truck2, d1=depot1, d3=depot3, s=supply, p33=pallet33, and p77=pallet77). Assume the rules are tried in the order given with usual Prolog backtracking among choices.
-- A truck can go either to Supply or to some place to which something it is carrying is ordered.
-- If a truck is at Supply, load L is ordered, and load L is not already carried, the truck can carry load L.
-- If a truck is carrying load L, and is where L is ordered, it can unload L there; this erases the order and the fact of carrying it, and adds a delivery fact.
32. Review question #4, on means-ends analysis Given these operators for an elevator:
call(F): push the button on floor F outside the elevator to make it come to you; recommended for [travelling(elevator)].
forward(D): move forward through the elevator entrance (D=in or D=out); recommended for [elevator_status(you,D)].
pushfloor(F): push the button in the elevator for floor F; recommended for [floor(you,F)].
wait: wait until the elevator doors open; recommended for [open(doors)].
Define preconditions and postconditions so that means-ends analysis will work. Use only these facts:
floor(X,F): X is at floor F, where X=you or X=elevator
elevator_status(you,S): S=in or S=out for you and the elevator
travelling(elevator): the elevator is moving
open(doors): the elevator doors are open
33. Review question #5: Use concept learning to learn a rule with “and”s and “or”s Example of terrorist: Middle Eastern, carrying a gun, nervous, heavy
Example of terrorist: European, carrying a gun, nervous, heavy
Nonexample of terrorist: Middle Eastern, not carrying a gun, nervous, heavy
Nonexample of terrorist: European, carrying a gun, nervous, heavy, police
Example of terrorist: American, carrying a gun, nervous, light
Example of terrorist: Middle Eastern, not carrying a gun, carrying a bomb, nervous, heavy