180 likes | 349 Views
Lecture 3. Data Mining Basics. Monday, January 22, 2001 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapter 1-2, Witten and Frank Sections 2.7-2.8, Mitchell. Lecture Outline.
E N D
Lecture 3 Data Mining Basics Monday, January 22, 2001 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapter 1-2, Witten and Frank Sections 2.7-2.8, Mitchell
Lecture Outline • Read Chapters 1-2, Witten and Frank; 2.7-2.8, Mitchell • Homework 1: Due Friday, February 2, 2001 (before 12 AM CST) • Paper Commentary 1: Due This Friday (in class) • U. Fayyad, “From Data Mining to Knowledge Discovery” • See guidelines in course notes • Supervised Learning (continued) • Version spaces • Candidate elimination algorithm • Derivation • Examples • The Need for Inductive Bias • Representations (hypothesis languages): a worst-case scenario • Change of representation • Computational Learning Theory
Representing Version Spaces • Hypothesis Space • A finite meet semilattice (partial ordering Less-Specific-Than; all ?) • Every pair of hypotheses has a greatest lower bound (GLB) • VSH,D the consistent poset (partially-ordered subset of H) • Definition: General Boundary • General boundaryG of version space VSH,D : set of most general members • Most general minimal elements of VSH,D “set of necessary conditions” • Definition: Specific Boundary • Specific boundary S of version space VSH,D : set of most specific members • Most specific maximal elements of VSH,D “set of sufficient conditions” • Version Space • Every member of the version space lies between S and G • VSH,D { hH | s S . g G . g P h P s} where P Less-Specific-Than
Candidate Elimination Algorithm [1] 1. Initialization G (singleton) set containing most general hypothesis in H, denoted {<?, … , ?>} S set of most specific hypotheses in H, denoted {<Ø, … , Ø>} 2. For each training example d If d is a positive example (Update-S) Remove from G any hypotheses inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that 1. h is consistent with d 2. Some member of G is more general than h (These are the greatest lower bounds, or meets, s d, in VSH,D) Remove from S any hypothesis that is more general than another hypothesis in S (remove any dominated elements)
Candidate Elimination Algorithm [2] (continued) If d is a negative example (Update-G) Remove from S any hypotheses inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specializations h of g such that 1. h is consistent with d 2. Some member of S is more specific than h (These are the least upper bounds, or joins, g d, in VSH,D) Remove from G any hypothesis that is less general than another hypothesis in G (remove any dominating elements)
S0 <Ø, Ø, Ø, Ø, Ø, Ø> S1 <Sunny, Warm, Normal, Strong, Warm, Same> S2 <Sunny, Warm, ?, Strong, Warm, Same> = S3 S4 <Sunny, Warm, ?, Strong, ?, ?> <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?> G4 <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> <?, ?, ?, ?, ?, Same> G3 G0 = G1 = G2 <?, ?, ?, ?, ?, ?> Example Trace d1: <Sunny, Warm, Normal, Strong, Warm, Same, Yes> d2: <Sunny, Warm, High, Strong, Warm, Same, Yes> d3: <Rainy, Cold, High, Strong, Warm, Change, No> d4: <Sunny, Warm, High, Strong, Cool, Change, Yes>
S: <Sunny, Warm, ?, Strong, ?, ?> <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?> G: <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> What Next Training Example? • What Query Should The Learner Make Next? • How Should These Be Classified? • <Sunny, Warm, Normal, Strong, Cool, Change> • <Rainy, Cold, Normal, Light, Warm, Same> • <Sunny, Warm, Normal, Light, Warm, Same>
What Justifies This Inductive Leap? • Example: Inductive Generalization • Positive example: <Sunny, Warm, Normal, Strong, Cool, Change, Yes> • Positive example: <Sunny, Warm, Normal, Light, Warm, Same, Yes> • Induced S: <Sunny, Warm, Normal, ?, ?, ?> • Why Believe We Can Classify The Unseen? • e.g., <Sunny, Warm, Normal, Strong, Warm, Same> • When is there enough information (in a new case) to make a prediction?
An Unbiased Learner • Example of A Biased H • Conjunctive concepts with don’t cares • What concepts can Hnot express? (Hint: what are its syntactic limitations?) • Idea • Choose H’ that expresses every teachable concept • i.e., H’ is the power set of X • Recall: | A B | = | B | | A | (A = X; B= {labels}; H’ = A B) • {{Rainy, Sunny} {Warm, Cold} {Normal, High} {None, Mild, Strong} {Cool, Warm} {Same, Change}} {0, 1} • An Exhaustive Hypothesis Language • Consider: H’ = disjunctions (), conjunctions (), negations (¬) over previous H • | H’ | = 2(2 • 2 • 2 • 3 • 2 • 2) = 296; | H | = 1 + (3 • 3 • 3 • 4 • 3 • 3) = 973 • What Are S, G For The Hypothesis Language H’? • S disjunction of all positive examples • G conjunction of all negated negative examples
Inductive Bias • Components of An Inductive Bias Definition • Concept learning algorithm L • Instances X, target concept c • Training examples Dc = {<x, c(x)>} • L(xi, Dc) = classification assigned to instance xiby L after training on Dc • Definition • The inductive bias of L is any minimal set of assertions B such that, for any target concept c and corresponding training examples Dc, xi X . [(B Dc xi) | L(xi, Dc)] where A | B means A logically entails B • Informal idea: preference for (i.e., restriction to) certain hypotheses by structural (syntactic) means • Rationale • Prior assumptions regarding target concept • Basis for inductive generalization
Inductive System Candidate Elimination Algorithm Using Hypothesis Space H Training Examples Classification of New Instance (or “Don’t Know”) Classification of New Instance (or “Don’t Know”) New Instance Equivalent Deductive System Training Examples Theorem Prover New Instance Assertion { c H } Inductive bias made explicit Inductive Systemsand Equivalent Deductive Systems
Three Learners with Different Biases • Rote Learner • Weakest bias: anything seen before, i.e., no bias • Store examples • Classify xif and only if it matches previously observed example • Version Space Candidate Elimination Algorithm • Stronger bias: concepts belonging to conjunctive H • Store extremal generalizations and specializations • Classify xif and only if it “falls within” S and G boundaries (all members agree) • Find-S • Even stronger bias: most specific hypothesis • Prior assumption: any instance not observed to be positive is negative • Classify x based on S set
x1 Unknown Function x2 y = f (x1,x2,x3, x4 ) x3 x4 Hypothesis Space:A Syntactic Restriction • Recall: 4-Variable Concept Learning Problem • Bias: Simple Conjunctive Rules • Only 16 simple conjunctive rules of the form y = xi xj xk • y = Ø, x1, …, x4,x1 x2, …, x3 x4, x1 x2 x3, …, x2 x3 x4, x1 x2 x3 x4 • Example above: no simple rule explains the data (counterexamples?) • Similarly for simple clauses (conjunction and disjunction allowed)
Hypothesis Space:m-of-n Rules • m-of-n Rules • 32 possible rules of the form: “y = 1 iff at least m of the following n variables are 1” • Found A Consistent Hypothesis!
Views of Learning • Removal of (Remaining) Uncertainty • Suppose unknown function was known to be m-of-n Boolean function • Could use training data to infer the function • Learning and Hypothesis Languages • Possible approach to guess a good, small hypothesis language: • Start with a very small language • Enlarge until it contains a hypothesis that fits the data • Inductive bias • Preference for certain languages • Analogous to data compression (removal of redundancy) • Later: coding the “model” versus coding the “uncertainty” (error) • We Could Be Wrong! • Prior knowledge could be wrong (e.g., y = x4 one-of (x1, x3) also consistent) • If guessed language was wrong, errors will occur on new cases
Two Strategies for Machine Learning • Develop Ways to Express Prior Knowledge • Role of prior knowledge: guides search for hypotheses / hypothesis languages • Expression languages for prior knowledge • Rule grammars; stochastic models; etc. • Restrictions on computational models; other (formal) specification methods • Develop Flexible Hypothesis Spaces • Structured collections of hypotheses • Agglomeration: nested collections (hierarchies) • Partitioning: decision trees, lists, rules • Neural networks; cases, etc. • Hypothesis spaces of adaptive size • Either Case: Develop Algorithms for Finding A Hypothesis That Fits Well • Ideally, will generalize well • Later: Bias Optimization (Meta-Learning, Wrappers)
Terminology • The Version Space Algorithm • Version space: constructive definition • S and G boundaries characterize learner’s uncertainty • Version space can be used to make predictions over unseen cases • Algorithms: Find-S, List-Then-Eliminate, candidate elimination • Consistent hypothesis - one that correctly predicts observed examples • Version space - space of all currently consistent (or satisfiable) hypotheses • Inductive Bias • Strength of inductive bias: how few hypotheses? • Specific biases: based on specific languages • Hypothesis Language • “Searchable subset” of the space of possible descriptors • m-of-n, conjunctive, disjunctive, clauses • Ability to represent a concept
Summary Points • Introduction to Supervised Concept Learning • Inductive Leaps Possible Only if Learner Is Biased • Futility of learning without bias • Strength of inductive bias: proportional to restrictions on hypotheses • Modeling Inductive Learners with Equivalent Deductive Systems • Representing inductive learning as theorem proving • Equivalent learning and inference problems • Syntactic Restrictions • Example: m-of-n concept • Other examples? • Views of Learning and Strategies • Removing uncertainty (“data compression”) • Role of knowledge • Next Lecture: More on Knowledge Discovery in Databases (KDD)