250 likes | 260 Views
Understand the process of learning in artificial intelligence, including examples such as rote learning, performance enhancement, and classification. Learn how to design a learning agent by determining which components to learn, available feedback, and representation used.
E N D
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College
What is learning? • Process which changes a system to enable it to do the same task or tasks drawn from the same population more efficiently next time (improving performance). • Examples (increasing abstraction) • Rote learning • Performance enhancement (problem solving) • Classification • Knowledge acquisition
Designing a Learning Agent • Which components of the performance element are to be learned? • What feedback is available to learn this? • What representation is used?
Symbolic vs. Non-Symbolic learning • If you “open the system up” after it has learned, can the knowledge be easily expressed? • Symbolic uses accessible internal representations • Non-symbolic uses inaccessible internal representations
Inductive learning • Given a set of examples (x, y) where x is input, y is output • Learn a function y=f(x) that • Returns correct results for all (x,y) pairs in the training set of examples • Generalizes well -- returns correct results for x values not in the training set
Ockham’s Razor • If two functions fit, pick the simplest • There is an inevitable tradeoff between the complexity of the hypothesis function and the degree of fit to the data.
Decision Trees • Each node is a question • Each leaf is a decision hair? Pet? legs? frog snake Cat Lion
Learning Decision Trees from Examples • Silly example: should I buy this car? 1. red VW (foreign, small, red) YES 2. green Cadillac (domestic, large, green) NO 3. blue Subaru (foreign, small, blue) YES • blue Mercedes (foreign, large, blue) NO • red Saturn (domestic, small, red) YES
Three types of learning • Supervised • The system learns a function from examples of inputs and outputs • Correct outputs must be available during training • Unsupervised • The system learns without feedback, based on global optimization criterion • Reinforcement • System is rewarded (or punished) for decisions • This is the most general, models most human learning (except school).
Recursive Splitting • Start with one big class • If there are some yes, some no, choose an attribute to split them (we now have 2 recursive problems) • Otherwise, we are done • When all recursive problems are solved, the remaining classes will have all YES or all NO • Each decision used for a split is a branch on the tree.
Recursive splitting example • Initial class { (foreign,small,red,yes), (domestic, large, green, no), (foreign, small, blue, yes), (foreign,large,blue,no), (domestic,small,red,yes) } • Split on size: { (foreign,small,red,yes), (foreign, small, blue, yes), (domestic,small,red,yes) } {(domestic, large, green, no), (foreign,large,blue,no)}
Choosing an attribute to split on • We want to split on an attribute that gives us information • If an attribute splits the class into all pos/all neg that’s best! • Otherwise: if an attribute splits the class roughly evenly, and one subclass is mostly pos, one mostly neg, that’s pretty good
A formal notation of “best” • Goal is to maximize information gain • Number of “bits” of information still needed after the split – number of bits needed before the split • Information • I(p,n) = –( (p/p+n)lg(p/p+n) + (n/p+n)lg(n/p+n) ) • We need to subtract the sum of the informations for the split, weighted by the number of items in each • Example: (4,2) -> (3, 0) and (1,2) • Value is I(4,2) - 1/2 * I(3,0) - 1/2*I(1,2)
Updating our recursive algorithm • Defun tree(examples) • If all examples are positive (or negative) return examples • Else • Choose best attribute using Information gain • Divide examples into sublists based on examples • Return • (cons attribute (mapcar #’Tree (list of sublists))) • Result will be tree with each element being an attribute and a list of branches.
Assessing a Learning System • Collect a large set of examples • Divide into test and training sets (disjoint) • Apply learning algorithm to training set (only) • Measure its performance on test set (only) • Repeat for different sizes of training sets • Repeat for different randomly selected test sets of each training set
Learning Curve % correct Training set size (% of total)
Learning Depends on Training • If the test set is not a random subset of the training set, strange results can occur! • What if test set contains only small cars, training set only large cars? • If the overall set of examples doesn’t “cover the space” the wrong concept will be learned • Tank and weather example
Overfitting is Bad • An algorithm is fully trained if it classifies every test case perfectly • But what if every leaf is a set with only one element? • Training set is perfectly classified • Each element of test set creates a new category-- we have no experience! • Avoid by requiring minimum information gain value in order to split a set
One example at a time • At any given point we have a current hypothesis that explains the examples • Positive examples (that were incorrectly classified as negative) extend the hypothesis until it includes the new example • Negative examples (that were incorrectly classified as positive) restrict the hypothesis until it does not include the new example
Extending and Restricting • To extend a hypothesis, “add in” the new information • Extended hypothesis = hypothesis | pos. example • To restrict a hypothesis “subtract out” the new information • Extended hypothesis = hypothesis & not(neg. ex.)
Candidate elimination (car example) • red VW (foreign, small, red) YES Min hypothesis: all foreign, small red things are good cars Max hypothesis: everything is a good car • green Cadillac (domestic, large, green) NO Min hypothesis: all foreign, small red things are good cars Max hypothesis: everything foreign or small or not green is a good car • blue Subaru (foreign, small, blue) YES Min hypothesis: all foreign, small (red or blue) things are good cars Max hypothesis: everything foreign or small or not green is a good car
Candidate elimination (car example) cont. • blue Mercedes (foreign, large, blue) NO Min hypothesis: all foreign, small, (red or blue) things are good cars Max hypothesis: everything small or (domestic and not green) or (foreign and not blue) or red is a good car • red Saturn (domestic, small, red) YES Min hypothesis: all small, (red or blue) things are good cars Max hypothesis: everything small or (domestic and not green) or (foreign and not blue) or red is a good car
Version Space Learning • Consider the set of all hypotheses consistent with the examples • This will be the “range” from min to max in the prior examples • This is called a version space, and is updated after each example • Least-commitment algorithm • We take no great leaps, but only make the minimal changes required for the concept to fit the examples.
Evaluating these algorithms • Decision Tree learning is faster • ... But you need to have all examples in advance • Decision trees make disjunctions easier to express • Both are highly dependent on having the right attributes available • Both are highly susceptible to noise (incorrect training examples)