CS 9633 Machine Learning Concept Learning

CS 9633 Machine LearningConcept Learning References: Machine Learning by Tom Mitchell, 1997, Chapter 2 Artificial Intelligence: A Modern Approach, by Russell and Norvig, Second Edition, 2003, pages 678 – 686 Elements of Machine Learning, by Pat Langley, 1996, Chapter 2 Computer Science Department CS 9633 KDD

Concept Learning • Inferring a Boolean-valued function from training examples • Training examples are labeled as members or non-members of the concept Computer Science Department CS 9633 KDD

Concept Learning Task Defined By • Set of instances over which target function is defined • Target function • Set of candidate hypotheses considered by the learner • Set of available training examples Computer Science Department CS 9633 KDD

Example Concept • Days when you would enjoy water sports Computer Science Department CS 9633 KDD

Instances X Possible days, each described by the attributes • Sky (Sunny, Cloudy, Rainy) • AirTemp (Warm, Cold) • Humidity (Normal, High) • Wind (Strong, Weak) • Water (Warm, Cold) • Forecast (Same, Change) Computer Science Department CS 9633 KDD

Hypotheses H • Each hypothesis is a vector of 6 constraints, specifying the values of 6 attributes • For each attribute, hypothesis is: • Value of ? if any value is acceptable for this attribute • Single required value for the attribute • Value of 0 if no value is acceptable • Sample hypothesis (Rainy, ?, ?,?, Warm ,?) Computer Science Department CS 9633 KDD

General and Specific Hypotheses • Most general hypothesis (?, ?, ?, ?, ?, ?) • Most specific hypothesis (0, 0, 0, 0, 0, 0) Computer Science Department CS 9633 KDD

Target Concept c EnjoySport: X  (0,1) Computer Science Department CS 9633 KDD

Training Examples D Computer Science Department CS 9633 KDD

Determine: A hypothesis h in H such that h(x) = c(x) for all x in X Computer Science Department CS 9633 KDD

Inductive Learning Hypothesis Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. Computer Science Department CS 9633 KDD

Concept Learning as Search • Concept learning can be viewed as searching through a large space of hypotheses implicitly defined by the hypothesis representation. Computer Science Department CS 9633 KDD

Sample and hypothesis space size How many instances? How many hypotheses? How many semantically distinct hypotheses? • Sky (Sunny, Cloudy, Rainy) • AirTemp (Warm, Cold) • Humidity (Normal, High) • Wind (Strong, Weak) • Water (Warm, Cold) • Forecast (Same, Change) Computer Science Department CS 9633 KDD

Searching hypothesis space • Goal is to efficiently search hypothesis space to find the hypothesis that best fits the training data • Hypothesis space is potentially • Very large • Possibly infinite Computer Science Department CS 9633 KDD

General to Specific Ordering of Hypotheses • It is often possible to use a natural general-to-specific ordering of the hypothesis space to organize the search • Can often exhaustively search all of the space without explicitly enumerating all hypotheses Computer Science Department CS 9633 KDD

Example h1 = <Sunny, ?, ?, Strong, ?, ?> h2 = <Sunny, ?, ?, ?, ?, ?> Which is more general? Computer Science Department CS 9633 KDD

Notation • For any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1. Computer Science Department CS 9633 KDD

Definition • Let hj and hk be boolean-valued functions defined over X. Then hj is more general than or equal to hk iff • Definition of strictly more general than Computer Science Department CS 9633 KDD

Partially Ordered Sets • Properties of a partial order • Reflexive • Transitive • Antisymmetric • Form a lattice Computer Science Department CS 9633 KDD

Important Point • The g and >g are dependent only on which instances satisfy the hypotheses and not on the target concept. • We will now consider algorithms that take advantage of this partial order among hypotheses to organize the search space Computer Science Department CS 9633 KDD

FIND-S Algorithm • Approach: start with most specific hypothesis and then generalize the hypothesis when it does not cover a training example • A hypothesis “covers” a training example”—correctly classifies example as true Computer Science Department CS 9633 KDD

FIND-S • Initialize h to the most specific hypothesis in H • For each positive training instance x For each attribute constraint ai in h If the constraint ai is satisfied by x then Do nothing Else Replace ai in h by the next more general constraint that is satisfied by x • Output hypothesis h Computer Science Department CS 9633 KDD

Apply to Training Examples D Computer Science Department CS 9633 KDD

Traversing lattice Specific General Computer Science Department CS 9633 KDD

Properties of FIND-S • Hypothesis space is described as a conjunction of attribute constraints • Guaranteed to output the most specific hypothesis within H that is consistent with training examples. • Final hypothesis is also consistent with negative examples if: • Correct target concept is contained in H • Training examples are correct Computer Science Department CS 9633 KDD

Consider this example Computer Science Department CS 9633 KDD

Issues • Has the learner converged to the correct target concept? Are there other consistent hypotheses? • Why prefer the most specific hypothesis? • Are the training examples consistent? • What if there are several maximally specific consistent hypotheses? Computer Science Department CS 9633 KDD

Candidate Elimination Algorithm • Goal is to output a description of all hypotheses consistent with the training examples. • Computes description without explicitly enumerating all members. • Is also called Least Commitment Search. • Like FIND-S, it uses more-general-than partial ordering Computer Science Department CS 9633 KDD

Definition • A hypothesis h is consistent with a set of training examples D iff h(x) = c(x) for each example <x, c(x)> in D. Computer Science Department CS 9633 KDD

Definition • The version space, denoted VSH,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D Computer Science Department CS 9633 KDD

List-Then-Eliminate Algorithm • VersionSpace a list of every hypothesis in H • For each training example <x, c(x)> Remove from VersionSpace any hypothesis for which h(x)  c(s) • Output the list of hypotheses in VersionSpace Computer Science Department CS 9633 KDD

More compact representation of VS • Candidate-elimination algorithm uses a more compact representation of VS • VS represented by most general and specific members. • These members form a boundary that delimits the version space within the partially ordered hypothesis space. • Also called Least Commitment Search. Computer Science Department CS 9633 KDD

{<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G : {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}

Inconsistent Region G1 G2 G3 ... Gm G-set S1 S2 S3 ... Sn S-set Inconsistent Region

Definitions of general and specific boundaries • Definition : The general boundaryG, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D. • Definition : The specific boundaryG, with respect to hypothesis space H and training data D, is the set of minimally general (maximally specific) members of H consistent with D. Computer Science Department CS 9633 KDD

Theorem 2.1 Version space representation theorem X is an arbitrary set of instances H a set of boolean-valued hypotheses defined over X c:X{0,1} is an arbitrary concept defined over X D is an arbitrary set of training examples {<x, c(x)>}. For all X, H, c, and D such that S and G are well defined, Computer Science Department CS 9633 KDD

Candidate-Elimination Learning Algorithm • Initialize G to most general and S to most specific • Use examples to refine Computer Science Department CS 9633 KDD

Initialize G to the set of maximally general hypotheses in H • Initialize S to the set of maximally specific hypotheses in H • For each training example d, do • If d is a positive example • Remove from G any hypothesis inconsistent with d • For each hypothesis s in S that is not consistent with d • Remove s from S • Add to S all minimal generalizations h of s such that • h is consistent with d, and some member of G is more general than h • Remove from S any hypothesis that is more general than another hypothesis in S • If d is a negative example • Remove from S any hypothesis inconsistent with d • For each hypothesis g in G that is not consistent with d • Remove g from G • Add to G all minimal generalizations h of g such that • h is consistent with d, and some member of S is more specific than h • Remove from G any hypothesis that is less general than another hypothesis in G

S0: {<0, 0, 0, 0, 0, 0>} S1: {<0, 0, 0, 0, 0, 0>} {<Sunny, Warm, Normal, Strong, Warm, Same>} G0: {<?, ?, ?, ?, ?, ?>} G1: {<?, ?, ?, ?, ?, ?>} Training Example 1: <Sunny, Warm, Normal, Strong, Warm, Same>, Enjoy Sport = Yes

{<Sunny, Warm, Normal, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, Warm, Same>} {<Sunny, Warm, Normal, Strong, Warm, Same>} S1: S2: G1: {<?, ?, ?, ?, ?, ?>} G2: {<?, ?, ?, ?, ?, ?>} Training Example 2: <Sunny, Warm, High, Strong, Warm, Same>, Enjoy Sport = Yes

{<Sunny, Warm, Normal, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, Warm, Same>} S2: S3: G2: {<?, ?, ?, ?, ?, ?>} G3: {<?, ?, ?, ?, ?, ?>} {<Sunny, ?, ?, ?, ?, ?>,} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>} Training Example 3: <Rainy, Cold, High, Strong, Warm, Change>, Enjoy Sport = No

{<Sunny, Warm, ?, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, ?, ?>} S3: S3: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>,} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>} G4: G3: Training Example 4: <Sunny, Warm, High, Strong, Cool, Change>, Enjoy Sport = Yes

Questions • Does the order of presentation of examples matter? • How will you know when the concept has been learned? • How do you know when you have presented enough training data? • What happens if incorrectly labeled examples are presented? • If the learner can request examples, which example should be requested next? Computer Science Department CS 9633 KDD

{<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} Select an example that would be classified as positive by some hypotheses and negative by others. Training Example Possibility: <Sunny, Warm, Normal, Light, Warm, Same> Positive

Partially Learned Concepts • Can we classify unseen examples even though we still have multiple hypotheses? • The answer is yes for some. Computer Science Department CS 9633 KDD

Optimal strategy • Generate instances that satisfy exactly half of the hypotheses in current VS. • Correct query concept can be found in log2|VS| experiments Computer Science Department CS 9633 KDD

{<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} A: <Sunny, Warm, Normal, Strong, Cool, Change> ?

{<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} B: <Rainy, Cold, Normal, Light, Warm, Same> ?

{<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} C: <Sunny, Warm, Normal, Light, Warm, Same> ?

{<Sunny, Warm, ?, Strong, ?, ?>} S: <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} D: <Sunny, Cold, Normal, Strong, Warm, Same> ?

CS 9633 Machine Learning Concept Learning

CS 9633 Machine Learning Concept Learning

Presentation Transcript

CS 9633 Machine Learning Decision Tree Learning

CS 9633 Machine Learning

CS 9633 Machine Learning Feature Selection

CS 60050 Machine Learning

CS 478 - Machine Learning

CS 9633 Machine Learning Support Vector Machines

CS 512 Machine Learning

CS 446: Machine Learning

CS 391L: Machine Learning: Rule Learning

CS 478 - Machine Learning

CS 9633 Machine Learning Explanation Based Learning

CS 6243 Machine Learning

CS 6243 Machine Learning

CS 9633 Machine Learning k-nearest neighbor

CS 9633 Machine Learning Inductive-Analytical Methods

CS 446: Machine Learning

CS 446: Machine Learning

Machine Learning, Chapter 2: Concept Learning

CS 60050 Machine Learning

CS 536: Machine Learning

CS 782 Machine Learning

CS 478 - Machine Learning