640 likes | 960 Views
CS 9633 Machine Learning Concept Learning. References: Machine Learning by Tom Mitchell, 1997, Chapter 2 Artificial Intelligence: A Modern Approach , by Russell and Norvig, Second Edition, 2003, pages 678 – 686 Elements of Machine Learning, by Pat Langley, 1996, Chapter 2.
E N D
CS 9633 Machine LearningConcept Learning References: Machine Learning by Tom Mitchell, 1997, Chapter 2 Artificial Intelligence: A Modern Approach, by Russell and Norvig, Second Edition, 2003, pages 678 – 686 Elements of Machine Learning, by Pat Langley, 1996, Chapter 2 Computer Science Department CS 9633 KDD
Concept Learning • Inferring a Boolean-valued function from training examples • Training examples are labeled as members or non-members of the concept Computer Science Department CS 9633 KDD
Concept Learning Task Defined By • Set of instances over which target function is defined • Target function • Set of candidate hypotheses considered by the learner • Set of available training examples Computer Science Department CS 9633 KDD
Example Concept • Days when you would enjoy water sports Computer Science Department CS 9633 KDD
Instances X Possible days, each described by the attributes • Sky (Sunny, Cloudy, Rainy) • AirTemp (Warm, Cold) • Humidity (Normal, High) • Wind (Strong, Weak) • Water (Warm, Cold) • Forecast (Same, Change) Computer Science Department CS 9633 KDD
Hypotheses H • Each hypothesis is a vector of 6 constraints, specifying the values of 6 attributes • For each attribute, hypothesis is: • Value of ? if any value is acceptable for this attribute • Single required value for the attribute • Value of 0 if no value is acceptable • Sample hypothesis (Rainy, ?, ?,?, Warm ,?) Computer Science Department CS 9633 KDD
General and Specific Hypotheses • Most general hypothesis (?, ?, ?, ?, ?, ?) • Most specific hypothesis (0, 0, 0, 0, 0, 0) Computer Science Department CS 9633 KDD
Target Concept c EnjoySport: X (0,1) Computer Science Department CS 9633 KDD
Training Examples D Computer Science Department CS 9633 KDD
Determine: A hypothesis h in H such that h(x) = c(x) for all x in X Computer Science Department CS 9633 KDD
Inductive Learning Hypothesis Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. Computer Science Department CS 9633 KDD
Concept Learning as Search • Concept learning can be viewed as searching through a large space of hypotheses implicitly defined by the hypothesis representation. Computer Science Department CS 9633 KDD
Sample and hypothesis space size How many instances? How many hypotheses? How many semantically distinct hypotheses? • Sky (Sunny, Cloudy, Rainy) • AirTemp (Warm, Cold) • Humidity (Normal, High) • Wind (Strong, Weak) • Water (Warm, Cold) • Forecast (Same, Change) Computer Science Department CS 9633 KDD
Searching hypothesis space • Goal is to efficiently search hypothesis space to find the hypothesis that best fits the training data • Hypothesis space is potentially • Very large • Possibly infinite Computer Science Department CS 9633 KDD
General to Specific Ordering of Hypotheses • It is often possible to use a natural general-to-specific ordering of the hypothesis space to organize the search • Can often exhaustively search all of the space without explicitly enumerating all hypotheses Computer Science Department CS 9633 KDD
Example h1 = <Sunny, ?, ?, Strong, ?, ?> h2 = <Sunny, ?, ?, ?, ?, ?> Which is more general? Computer Science Department CS 9633 KDD
Notation • For any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1. Computer Science Department CS 9633 KDD
Definition • Let hj and hk be boolean-valued functions defined over X. Then hj is more general than or equal to hk iff • Definition of strictly more general than Computer Science Department CS 9633 KDD
Partially Ordered Sets • Properties of a partial order • Reflexive • Transitive • Antisymmetric • Form a lattice Computer Science Department CS 9633 KDD
Important Point • The g and >g are dependent only on which instances satisfy the hypotheses and not on the target concept. • We will now consider algorithms that take advantage of this partial order among hypotheses to organize the search space Computer Science Department CS 9633 KDD
FIND-S Algorithm • Approach: start with most specific hypothesis and then generalize the hypothesis when it does not cover a training example • A hypothesis “covers” a training example”—correctly classifies example as true Computer Science Department CS 9633 KDD
FIND-S • Initialize h to the most specific hypothesis in H • For each positive training instance x For each attribute constraint ai in h If the constraint ai is satisfied by x then Do nothing Else Replace ai in h by the next more general constraint that is satisfied by x • Output hypothesis h Computer Science Department CS 9633 KDD
Apply to Training Examples D Computer Science Department CS 9633 KDD
Traversing lattice Specific General Computer Science Department CS 9633 KDD
Properties of FIND-S • Hypothesis space is described as a conjunction of attribute constraints • Guaranteed to output the most specific hypothesis within H that is consistent with training examples. • Final hypothesis is also consistent with negative examples if: • Correct target concept is contained in H • Training examples are correct Computer Science Department CS 9633 KDD
Consider this example Computer Science Department CS 9633 KDD
Issues • Has the learner converged to the correct target concept? Are there other consistent hypotheses? • Why prefer the most specific hypothesis? • Are the training examples consistent? • What if there are several maximally specific consistent hypotheses? Computer Science Department CS 9633 KDD
Candidate Elimination Algorithm • Goal is to output a description of all hypotheses consistent with the training examples. • Computes description without explicitly enumerating all members. • Is also called Least Commitment Search. • Like FIND-S, it uses more-general-than partial ordering Computer Science Department CS 9633 KDD
Definition • A hypothesis h is consistent with a set of training examples D iff h(x) = c(x) for each example <x, c(x)> in D. Computer Science Department CS 9633 KDD
Definition • The version space, denoted VSH,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D Computer Science Department CS 9633 KDD
List-Then-Eliminate Algorithm • VersionSpace a list of every hypothesis in H • For each training example <x, c(x)> Remove from VersionSpace any hypothesis for which h(x) c(s) • Output the list of hypotheses in VersionSpace Computer Science Department CS 9633 KDD
More compact representation of VS • Candidate-elimination algorithm uses a more compact representation of VS • VS represented by most general and specific members. • These members form a boundary that delimits the version space within the partially ordered hypothesis space. • Also called Least Commitment Search. Computer Science Department CS 9633 KDD
{<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G : {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}
Inconsistent Region G1 G2 G3 ... Gm G-set S1 S2 S3 ... Sn S-set Inconsistent Region
Definitions of general and specific boundaries • Definition : The general boundaryG, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D. • Definition : The specific boundaryG, with respect to hypothesis space H and training data D, is the set of minimally general (maximally specific) members of H consistent with D. Computer Science Department CS 9633 KDD
Theorem 2.1 Version space representation theorem X is an arbitrary set of instances H a set of boolean-valued hypotheses defined over X c:X{0,1} is an arbitrary concept defined over X D is an arbitrary set of training examples {<x, c(x)>}. For all X, H, c, and D such that S and G are well defined, Computer Science Department CS 9633 KDD
Candidate-Elimination Learning Algorithm • Initialize G to most general and S to most specific • Use examples to refine Computer Science Department CS 9633 KDD
Initialize G to the set of maximally general hypotheses in H • Initialize S to the set of maximally specific hypotheses in H • For each training example d, do • If d is a positive example • Remove from G any hypothesis inconsistent with d • For each hypothesis s in S that is not consistent with d • Remove s from S • Add to S all minimal generalizations h of s such that • h is consistent with d, and some member of G is more general than h • Remove from S any hypothesis that is more general than another hypothesis in S • If d is a negative example • Remove from S any hypothesis inconsistent with d • For each hypothesis g in G that is not consistent with d • Remove g from G • Add to G all minimal generalizations h of g such that • h is consistent with d, and some member of S is more specific than h • Remove from G any hypothesis that is less general than another hypothesis in G
S0: {<0, 0, 0, 0, 0, 0>} S1: {<0, 0, 0, 0, 0, 0>} {<Sunny, Warm, Normal, Strong, Warm, Same>} G0: {<?, ?, ?, ?, ?, ?>} G1: {<?, ?, ?, ?, ?, ?>} Training Example 1: <Sunny, Warm, Normal, Strong, Warm, Same>, Enjoy Sport = Yes
{<Sunny, Warm, Normal, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, Warm, Same>} {<Sunny, Warm, Normal, Strong, Warm, Same>} S1: S2: G1: {<?, ?, ?, ?, ?, ?>} G2: {<?, ?, ?, ?, ?, ?>} Training Example 2: <Sunny, Warm, High, Strong, Warm, Same>, Enjoy Sport = Yes
{<Sunny, Warm, Normal, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, Warm, Same>} S2: S3: G2: {<?, ?, ?, ?, ?, ?>} G3: {<?, ?, ?, ?, ?, ?>} {<Sunny, ?, ?, ?, ?, ?>,} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>} Training Example 3: <Rainy, Cold, High, Strong, Warm, Change>, Enjoy Sport = No
{<Sunny, Warm, ?, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, ?, ?>} S3: S3: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>,} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>} G4: G3: Training Example 4: <Sunny, Warm, High, Strong, Cool, Change>, Enjoy Sport = Yes
Questions • Does the order of presentation of examples matter? • How will you know when the concept has been learned? • How do you know when you have presented enough training data? • What happens if incorrectly labeled examples are presented? • If the learner can request examples, which example should be requested next? Computer Science Department CS 9633 KDD
{<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} Select an example that would be classified as positive by some hypotheses and negative by others. Training Example Possibility: <Sunny, Warm, Normal, Light, Warm, Same> Positive
Partially Learned Concepts • Can we classify unseen examples even though we still have multiple hypotheses? • The answer is yes for some. Computer Science Department CS 9633 KDD
Optimal strategy • Generate instances that satisfy exactly half of the hypotheses in current VS. • Correct query concept can be found in log2|VS| experiments Computer Science Department CS 9633 KDD
{<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} A: <Sunny, Warm, Normal, Strong, Cool, Change> ?
{<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} B: <Rainy, Cold, Normal, Light, Warm, Same> ?
{<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} C: <Sunny, Warm, Normal, Light, Warm, Same> ?
{<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} D: <Sunny, Cold, Normal, Strong, Warm, Same> ?