1 / 62

CS 9633 Machine Learning Concept Learning

CS 9633 Machine Learning Concept Learning. References: Machine Learning by Tom Mitchell, 1997, Chapter 2 Artificial Intelligence: A Modern Approach , by Russell and Norvig, Second Edition, 2003, pages 678 – 686 Elements of Machine Learning, by Pat Langley, 1996, Chapter 2.

neylan
Download Presentation

CS 9633 Machine Learning Concept Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 9633 Machine LearningConcept Learning References: Machine Learning by Tom Mitchell, 1997, Chapter 2 Artificial Intelligence: A Modern Approach, by Russell and Norvig, Second Edition, 2003, pages 678 – 686 Elements of Machine Learning, by Pat Langley, 1996, Chapter 2 Computer Science Department CS 9633 KDD

  2. Concept Learning • Inferring a Boolean-valued function from training examples • Training examples are labeled as members or non-members of the concept Computer Science Department CS 9633 KDD

  3. Concept Learning Task Defined By • Set of instances over which target function is defined • Target function • Set of candidate hypotheses considered by the learner • Set of available training examples Computer Science Department CS 9633 KDD

  4. Example Concept • Days when you would enjoy water sports Computer Science Department CS 9633 KDD

  5. Instances X Possible days, each described by the attributes • Sky (Sunny, Cloudy, Rainy) • AirTemp (Warm, Cold) • Humidity (Normal, High) • Wind (Strong, Weak) • Water (Warm, Cold) • Forecast (Same, Change) Computer Science Department CS 9633 KDD

  6. Hypotheses H • Each hypothesis is a vector of 6 constraints, specifying the values of 6 attributes • For each attribute, hypothesis is: • Value of ? if any value is acceptable for this attribute • Single required value for the attribute • Value of 0 if no value is acceptable • Sample hypothesis (Rainy, ?, ?,?, Warm ,?) Computer Science Department CS 9633 KDD

  7. General and Specific Hypotheses • Most general hypothesis (?, ?, ?, ?, ?, ?) • Most specific hypothesis (0, 0, 0, 0, 0, 0) Computer Science Department CS 9633 KDD

  8. Target Concept c EnjoySport: X  (0,1) Computer Science Department CS 9633 KDD

  9. Training Examples D Computer Science Department CS 9633 KDD

  10. Determine: A hypothesis h in H such that h(x) = c(x) for all x in X Computer Science Department CS 9633 KDD

  11. Inductive Learning Hypothesis Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. Computer Science Department CS 9633 KDD

  12. Concept Learning as Search • Concept learning can be viewed as searching through a large space of hypotheses implicitly defined by the hypothesis representation. Computer Science Department CS 9633 KDD

  13. Sample and hypothesis space size How many instances? How many hypotheses? How many semantically distinct hypotheses? • Sky (Sunny, Cloudy, Rainy) • AirTemp (Warm, Cold) • Humidity (Normal, High) • Wind (Strong, Weak) • Water (Warm, Cold) • Forecast (Same, Change) Computer Science Department CS 9633 KDD

  14. Searching hypothesis space • Goal is to efficiently search hypothesis space to find the hypothesis that best fits the training data • Hypothesis space is potentially • Very large • Possibly infinite Computer Science Department CS 9633 KDD

  15. General to Specific Ordering of Hypotheses • It is often possible to use a natural general-to-specific ordering of the hypothesis space to organize the search • Can often exhaustively search all of the space without explicitly enumerating all hypotheses Computer Science Department CS 9633 KDD

  16. Example h1 = <Sunny, ?, ?, Strong, ?, ?> h2 = <Sunny, ?, ?, ?, ?, ?> Which is more general? Computer Science Department CS 9633 KDD

  17. Notation • For any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1. Computer Science Department CS 9633 KDD

  18. Definition • Let hj and hk be boolean-valued functions defined over X. Then hj is more general than or equal to hk iff • Definition of strictly more general than Computer Science Department CS 9633 KDD

  19. Partially Ordered Sets • Properties of a partial order • Reflexive • Transitive • Antisymmetric • Form a lattice Computer Science Department CS 9633 KDD

  20. Important Point • The g and >g are dependent only on which instances satisfy the hypotheses and not on the target concept. • We will now consider algorithms that take advantage of this partial order among hypotheses to organize the search space Computer Science Department CS 9633 KDD

  21. FIND-S Algorithm • Approach: start with most specific hypothesis and then generalize the hypothesis when it does not cover a training example • A hypothesis “covers” a training example”—correctly classifies example as true Computer Science Department CS 9633 KDD

  22. FIND-S • Initialize h to the most specific hypothesis in H • For each positive training instance x For each attribute constraint ai in h If the constraint ai is satisfied by x then Do nothing Else Replace ai in h by the next more general constraint that is satisfied by x • Output hypothesis h Computer Science Department CS 9633 KDD

  23. Apply to Training Examples D Computer Science Department CS 9633 KDD

  24. Traversing lattice Specific General Computer Science Department CS 9633 KDD

  25. Properties of FIND-S • Hypothesis space is described as a conjunction of attribute constraints • Guaranteed to output the most specific hypothesis within H that is consistent with training examples. • Final hypothesis is also consistent with negative examples if: • Correct target concept is contained in H • Training examples are correct Computer Science Department CS 9633 KDD

  26. Consider this example Computer Science Department CS 9633 KDD

  27. Issues • Has the learner converged to the correct target concept? Are there other consistent hypotheses? • Why prefer the most specific hypothesis? • Are the training examples consistent? • What if there are several maximally specific consistent hypotheses? Computer Science Department CS 9633 KDD

  28. Candidate Elimination Algorithm • Goal is to output a description of all hypotheses consistent with the training examples. • Computes description without explicitly enumerating all members. • Is also called Least Commitment Search. • Like FIND-S, it uses more-general-than partial ordering Computer Science Department CS 9633 KDD

  29. Definition • A hypothesis h is consistent with a set of training examples D iff h(x) = c(x) for each example <x, c(x)> in D. Computer Science Department CS 9633 KDD

  30. Definition • The version space, denoted VSH,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D Computer Science Department CS 9633 KDD

  31. List-Then-Eliminate Algorithm • VersionSpace a list of every hypothesis in H • For each training example <x, c(x)> Remove from VersionSpace any hypothesis for which h(x)  c(s) • Output the list of hypotheses in VersionSpace Computer Science Department CS 9633 KDD

  32. More compact representation of VS • Candidate-elimination algorithm uses a more compact representation of VS • VS represented by most general and specific members. • These members form a boundary that delimits the version space within the partially ordered hypothesis space. • Also called Least Commitment Search. Computer Science Department CS 9633 KDD

  33. {<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G : {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}

  34. Inconsistent Region G1 G2 G3 ... Gm G-set S1 S2 S3 ... Sn S-set Inconsistent Region

  35. Definitions of general and specific boundaries • Definition : The general boundaryG, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D. • Definition : The specific boundaryG, with respect to hypothesis space H and training data D, is the set of minimally general (maximally specific) members of H consistent with D. Computer Science Department CS 9633 KDD

  36. Theorem 2.1 Version space representation theorem X is an arbitrary set of instances H a set of boolean-valued hypotheses defined over X c:X{0,1} is an arbitrary concept defined over X D is an arbitrary set of training examples {<x, c(x)>}. For all X, H, c, and D such that S and G are well defined, Computer Science Department CS 9633 KDD

  37. Candidate-Elimination Learning Algorithm • Initialize G to most general and S to most specific • Use examples to refine Computer Science Department CS 9633 KDD

  38. Initialize G to the set of maximally general hypotheses in H • Initialize S to the set of maximally specific hypotheses in H • For each training example d, do • If d is a positive example • Remove from G any hypothesis inconsistent with d • For each hypothesis s in S that is not consistent with d • Remove s from S • Add to S all minimal generalizations h of s such that • h is consistent with d, and some member of G is more general than h • Remove from S any hypothesis that is more general than another hypothesis in S • If d is a negative example • Remove from S any hypothesis inconsistent with d • For each hypothesis g in G that is not consistent with d • Remove g from G • Add to G all minimal generalizations h of g such that • h is consistent with d, and some member of S is more specific than h • Remove from G any hypothesis that is less general than another hypothesis in G

  39. S0: {<0, 0, 0, 0, 0, 0>} S1: {<0, 0, 0, 0, 0, 0>} {<Sunny, Warm, Normal, Strong, Warm, Same>} G0: {<?, ?, ?, ?, ?, ?>} G1: {<?, ?, ?, ?, ?, ?>} Training Example 1: <Sunny, Warm, Normal, Strong, Warm, Same>, Enjoy Sport = Yes

  40. {<Sunny, Warm, Normal, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, Warm, Same>} {<Sunny, Warm, Normal, Strong, Warm, Same>} S1: S2: G1: {<?, ?, ?, ?, ?, ?>} G2: {<?, ?, ?, ?, ?, ?>} Training Example 2: <Sunny, Warm, High, Strong, Warm, Same>, Enjoy Sport = Yes

  41. {<Sunny, Warm, Normal, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, Warm, Same>} S2: S3: G2: {<?, ?, ?, ?, ?, ?>} G3: {<?, ?, ?, ?, ?, ?>} {<Sunny, ?, ?, ?, ?, ?>,} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>} Training Example 3: <Rainy, Cold, High, Strong, Warm, Change>, Enjoy Sport = No

  42. {<Sunny, Warm, ?, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, ?, ?>} S3: S3: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>,} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>} G4: G3: Training Example 4: <Sunny, Warm, High, Strong, Cool, Change>, Enjoy Sport = Yes

  43. Questions • Does the order of presentation of examples matter? • How will you know when the concept has been learned? • How do you know when you have presented enough training data? • What happens if incorrectly labeled examples are presented? • If the learner can request examples, which example should be requested next? Computer Science Department CS 9633 KDD

  44. {<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} Select an example that would be classified as positive by some hypotheses and negative by others. Training Example Possibility: <Sunny, Warm, Normal, Light, Warm, Same> Positive

  45. Partially Learned Concepts • Can we classify unseen examples even though we still have multiple hypotheses? • The answer is yes for some. Computer Science Department CS 9633 KDD

  46. Optimal strategy • Generate instances that satisfy exactly half of the hypotheses in current VS. • Correct query concept can be found in log2|VS| experiments Computer Science Department CS 9633 KDD

  47. {<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} A: <Sunny, Warm, Normal, Strong, Cool, Change> ?

  48. {<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} B: <Rainy, Cold, Normal, Light, Warm, Same> ?

  49. {<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} C: <Sunny, Warm, Normal, Light, Warm, Same> ?

  50. {<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} D: <Sunny, Cold, Normal, Strong, Warm, Same> ?

More Related