Learning Agents Laboratory Computer Science Department George Mason University

CS 782 Machine Learning 3. Inductive Learning from Examples: Version space learning Prof. Gheorghe Tecuci Learning Agents Laboratory Computer Science Department George Mason University

Overview Instances, concepts and generalization Concept learning from examples Version spaces and the candidate elimination algorithm The LEX system The learning bias Discussion Recommended reading

Basic ontological elements: instances and concepts An instance is a representation of a particular entity from the application domain. A concept is a representation of a set of instances. state_government state_government government_of_US_1943 government_of_Britain_1943 instance_of instance_of government_of_US_1943 government_of_Britain_1943 “instance_of” is the relationship between an instance and the concept to which it belongs. “state_government” represents the set of all entities that are governments of states. This set includes “government_of_US_1943” and “government_of_Britain_1943” which are called positive examples. An entity which is not an instance of a concept is called a negative example of that concept.

Concept generality A concept P is more general than another concept Q if and only if the set of instances represented by P includes the set of instances represented by Q. state_government Example: democratic_government representative_ democracy totalitarian_ government parliamentary_ democracy state_government “subconcept_of” is the relationship between a concept and a more general concept. subconcept_of democratic_government

A generalization hierarchy governing_body ad_hoc_ governing_body established_ governing_body other_type_of_ governing_body state_government group_governing_body feudal_god_ king_government other_state_ government dictator other_ group_ governing_ body democratic_ government monarchy deity_figure representative_ democracy parliamentary_ democracy government_ of_Italy_1943 democratic_ council_ or_board autocratic_ leader totalitarian_ government government_ of_US_1943 government_ of_Britain_1943 chief_and_ tribal_council theocratic_ government police_ state military_ dictatorship fascist_ state religious_ dictatorship theocratic_ democracy communist_ dictatorship religious_ dictatorship government_ of_Germany_1943 government_ of_USSR_1943

Empirical inductive concept learning from examples Illustration Given Positive examples of cups: P1 P2 ... Negative examples of cups: N1 … Learn A description of the cup concept: has-handle(x), ... Approach: Compare the positive and the negative examples of a concept, in terms of their similarities and differences, and learn the concept as a generalized description of the similarities of the positive examples. Why is Concept Learning important? Concept Learning allows the agent to recognize other entities as being instances of the learned concept.

The learning problem Given • a language of instances; •a language of generalizations; •a set of positive examples (E1, ..., En) of a concept • a set of negative examples (C1, ... , Cm) of the same concept • a learning bias • other background knowledge Determine • a concept description which is a generalization of the positive examples that does not cover any of the negative examples Purpose of concept learning Predict if an instance is an example of the learned concept.

Generalization and specialization rules Learning a concept from examples is based on generalization and specialization rules. A generalization rule is a rule that transforms an expression into a more general expression. A specialization rule is a rule that transforms an expression into a less general expression. The reverse of any generalization rule is a specialization rule.

Discussion Indicate several generalizations of the following sentence: Students who have lived in Fairfax for more then 3 years. Indicate several specializations of the following sentence: Students who have lived in Fairfax for more then 3 years.

Generalization (and specialization) rules Turning constants into variables Climbing the generalization hierarchy Dropping condition Generalizing numbers Adding alternatives

Turning constants into variables Generalizes an expression by replacing a constant with a variable. The set of multi_group_forces with 5 subgroups. ?O1 is multi_group_force number_of_subgroups 5 Japan_1944_Armed_Forces generalization specialization Axis_forces_Sicily 5 ?N1 ?N15 Allied_forces_operation_Husky ?O1 is multi_group_force number_of_subgroups ?N1 The set of multi_group_forces with any number of subgroups.

Climbing the generalization hierarchies Generalizes an expression by replacing a concept with a more general one. democratic_government representative_democracy parliamentary_democracy The set of single state forces governed by representative democracies ?O1 is single_state_force has_as_governing_body ?O2 ?O2 is representative_democracy generalization specialization representative_democracy democratic_government democratic_government representative_democracy The set of single state forces governed by democracies ?O1 is single_state_force has_as_governing_body ?O2 ?O2 is democratic_government

Dropping conditions Generalizes an expression by removing a constraint from its description. The set of multi-member forces that have international legitimacy. ?O1 is multi_member_force has_international_legitimacy “yes” generalization specialization ?O1 is multi_member_force The set of multi-member forces (that may or may not have international legitimacy).

Extending intervals Generalizes an expression by replacing a number with an interval, or by replacing an interval with a larger interval. The set of multi_group_forces with exactly 5 subgroups. ?O1 is multi_group_force number_of_subgroups 5 generalization specialization 5 [3 .. 7] [3 .. 7] 5 ?O1 is multi_group_force number_of_subgroups ?N1 ?N1 is-in [3 .. 7] The set of multi_group_forces with at least 3 subgroups and at most 7 subgroups. generalization specialization [3 .. 7][2 .. 10] [2 .. 10] [3 .. 7] ?O1 is multi_group_force number_of_subgroups ?N1 ?N1 is-in [2 .. 10] The set of multi_group_forces with at most 10 subgroups.

Adding alternatives Generalizes an expression by replacing a concept C1 with the union (C1 U C2), which is a more general concept. The set of alliances. ?O1 is alliance has_as_member ?O2 generalization specialization ?O1 is alliance OR coalition has_as_member ?O2 The set including both the alliances and the coalitions.

Generalization and specialization rules Turning constants into variables Turning variables into constants Climbing the generalization hierarchies Descending the generalization hierarchies Dropping conditions Adding conditions Extending intervals Reducing intervals Adding alternatives Dropping alternatives

Types of generalizations and specializations Operational definition of generalization/specialization Generalization/specialization of two concepts Minimally general generalization of two concepts Maximally general specialization of two concepts Least general generalization of two concepts

Operational definition of generalization Non-operational definition: A concept P is said to be more general than another concept Q if and only if the set of instances represented by P includes the set of instances represented by Q. Operational definition: A concept P is said to be more general than another concept Q if and only if Q can be transformed into P by applying a sequence of generalization rules. Why isn’t this an operational definition? This definition is not operational because it requires to show that each instance I from a potential infinite set Q is also in the set P.

Generalization of two concepts Definition: The concept Cg is a generalization of the concepts C1 and C2 if and only if Cg is more general than C1 and Cg is more general than C2. MANEUVER-UNIT MANEUVER-UNIT is a generalization of ARMORED-UNIT and INFANTRY-UNIT INFANTRY-UNIT ARMORED-UNIT Operational definition: The concept Cg is a generalization of the concepts C1 and C2 if and only if both C1 and C2 can be transformed into Cg by applying gene-ralization rules (assuming the existence of a complete set of rules). How would you define this? Is the above definition operational?

Generalization of two concepts: example C1: ?O1 IS COURSE-OF-ACTION TOTAL-NUMBER-OF-OFFENSIVE-ACTIONS 10 TYPE OFFENSIVE C2: ?O1 IS COURSE-OF-ACTION TOTAL-NUMBER-OF-OFFENSIVE-ACTIONS 5 Generalize 10 to [5 .. 10] Drop “?O1 TYPE OFFENSIVE” Generalize 5 to [5 .. 10] C: ?O1 IS COURSE-OF-ACTION TOTAL-NUMBER-OF-OFFENSIVE-ACTIONS ?N1 ?N1 IS-IN [5 … 10] Remark: COA=Course of Action

Specialization of two concepts Definition: The concept Cs is a specialization of the concepts C1 and C2 if and only if Cs is less general than C1 and Cs is less general than C2. MILITARY-MANEUVER MILITARY-ATTACK PENETRATE-MILITARY-TASK is a specialization of MILITARY-MANEUVER and MILITARY-ATTACK PENETRATE-MILITARY-TASK Operational definition: The concept Cs is a specialization of the concepts C1 and C2 if and only if both C1 and C2 can be transformed into Cs by applying specialization rules (or Cs can be transformed into both C1 and into C2 by applying generalization rules). This assumes a complete set of rules.

Other useful definitions Minimally general generalization The concept G is a minimally general generalization of A and B if and only if G is a generalization of A and B, and G is not more general than any other generalization of A and B. Least general generalization If there is only one minimally general generalization of two concepts A and B, then this generalization is called the least general generalization of A and B. Maximally general specialization The concept C is a maximally general specialization of two concepts A and B if and only if C is a specialization of A and B and no other specialization of A and B is more general than C. Specialization of a concept with a negative example

Concept learning: another illustration Positive examples: Allied_Forces_1943 is equal_partner_multi_state_alliance has_as_member US_1943 European_Axis_1943 is dominant_partner_multi_state_alliance has_as_member Germany_1943 Negative examples: Somali_clans_1992 is equal_partner_multi_group_coalition has_as_member Isasq_somali_clan_1992 Cautious learner Learned concept: ?O1 is multi_state_alliance has_as_member ?O2 ?O2 is single_state_force A multi-state alliance that has as member a single state force.

Discussion Concept to be learned Concept learned by a cautions learner What could be said about the predictions of a cautious learner?

Concept learning: yet another illustration Positive examples: Allied_Forces_1943 is equal_partner_multi_state_alliance has_as_member US_1943 European_Axis_1943 is dominant_partner_multi_state_alliance has_as_member Germany_1943 Negative examples: Somali_clans_1992 is equal_partner_multi_group_coalition has_as_member Isasq_somali_clan_1992 Aggressive learner Learned concept: ?O1 is multi_member_force has_as_member ?O2 ?O2 is single_state_force A multi-member force that has as member a single state force.

Discussion Concept learned by an aggressive learner Concept to be learned What could be said about the predictions of an aggressive learner?

Discussion Concept to be learned Concept learned by a cautions learner Concept learned by an aggressive learner Concept to be learned Concept learned by an aggressive learner Concept to be learned Concept learned by a cautions learner How could one synergistically integrate a cautious learner with an aggressive learner to take advantage of their qualities to compensate for each other’s weaknesses?

Basic idea of version space concept learning UB Initialize the lower bound to the first positive example (LB=E1) and the upper bound (UB) to the most general generalization of E1. LB + UB If the next example is a positive one, then generalize LB as little as possible to cover it. LB + + UB If the next example is a negative one, then specialize UB as little as possible to uncover it and to remain more general than LB. LB + _ + … _ UB=LB Repeat the above two steps with the rest of examples until UB=LB. This is the learned concept. _ + + + _ + Consider the examples E1, … , E2 in sequence.

The candidate elimination algorithm (Mitchell, 1978) Let us suppose that we have an example e1 of a concept to be learned. Then, any sentence of the representation language which is more general than this example, is a plausible hypothesis for the concept. The version space is: H = { h | h is more general than e1 }

The candidate elimination algorithm (cont.) more general UB • • • • • • • • • • • • • • • • • • • LB more specific As new examples and counterexamples are presented to the program, candidate concepts are eliminated from H. This is practically done by updating the set G (which is the set of the most general elements in H) and the set S (which is the set of the most specific elements in H).

The candidate elimination algorithm • Initialize S to the first positive example and G to its most general generalization • 2. Accept a new training instance I • • If I is a positive example then • - remove from G all the concepts that do not cover I; • - generalize the elements in S as little as possible to cover I but remain less general than some concept in G; • - keep in S the minimally general concepts. • • If I is a negative example then • - remove from S all the concepts that cover I; • - specialize the elements in G as little as possible to uncover I and be more general than at least one element from S; • - keep in G the maximally general concepts. • 3. Repeat 2 until G=S and they contain a single concept C (this is the learned concept)

Illustration of the candidate elimination algorithm Learning process: +(ball, large) 1 G = {(any-shape, any-size)} -(brick, small) 2 Input examples: G = {(ball, any-size) (any-shape, large)} -(cube, large) shape size class 3 ball large + G = {(ball, any-size)} brick small – || cube large – S = {(ball, any-size)} ball small + 4 S = {(ball, large)} +(ball, small) 1 +(ball, large) Language of generalizations: (shape, size) shape: {ball, brick, cube, any-shape} size: {large, small, any-size} Language of instances: (shape, size) shape: {ball, brick, cube} size: {large, small}

The LEX system Lex is a system that uses the version space method to learn heuristics for suggesting when the integration operators should be applied for solving symbolic integration problems. The problem of learning control heuristics Given Operators for symbolic integration: OP1: ∫ r f(x) dx --> r ∫ f(x) dx OP2: ∫ u dv --> uv - ∫ v du, where u=f1(x) and dv=f2(x)dx OP3: 1 f(x) --> f(x) OP4: ∫ (f1(x) + f2(x))dx --> ∫ f1(x) dx + ∫ f2(x)dx OP5: ∫ sin(x) dx --> -cos(x) + C OP6: ∫ cos(x) dx --> sin(x) + C Find Heuristics for applying the operators as, for instance, the following one: To solve ∫ rx transc(x) dx apply OP2 with u=rx and dv=transc(x)dx

Remarks The integration operators assure a satisfactory level of competence to the LEX system. That it, LEX is able in principle to solve a significant class of symbolic integration problems. However, in practice, it may not be able to solve many of these problems because this would require too many resources of time and space. The description of an operator shows when the operator is applicable, while a heuristic associated with an operator shows when the operator should be applied, in order to solve a problem. LEX tries to discover, for each operator OPi, the definition of the concept: situations in which OPi should be used.

The architecture of LEX 4. How to generate a new problem? PROBLEM GENERATOR ∫ Version space of a proposed heuristic: 3x cos(x) dx ∫ G: f1(x) f2(x) dx --> Apply OP2 with u = f1(x) dv = f2(x) dx 3. How to learn from these steps? How is the initial VS defined? 1. What search strategy to use for problem solving? ∫ S: 3x cos(x) dx --> Apply OP2 with u = 3x dv = cos(x) dx PROBLEM LEARNER SOLVER ∫ 3x cos(x) dx OP2 with ... u = 3x, One of the suggested dv = cos(x) dx positive training instances: ∫ 3x sin(x) - 3sin(x) dx ∫ 3x cos(x) dx --> Apply OP2 ... OP1 with u = 3x dv = cos(x) dx ∫ 3x sin(x) - 3 sin(x) dx OP5 CRITIC 3x sin(x) + 3cos(x) + C 2. How to characterize individual problem solving steps?

Illustration of the learning process Continue learning of the heuristic for applying OP2: The problem generator generates a new problem to solve that is useful for learning. The problem solver Solves this problem The critic Extract positive and negative examples from the problem solving tree. The learner Refine the version space of the heuristic.

The learning bias A bias is any basis for choosing one generalization over another, other than strict consistency with the observed training examples. Types of bias: - restricted hypothesis space bias; - preference bias.

Restricted hypothesis space bias The hypothesis space H (i.e. the space containing all the possible concept descriptions) is defined by the generalization language. This language may not be capable of expressing all possible classes of instances. Consequently, the hypothesis space in which the concept description is searched is restricted. Some of the restricted spaces investigated: - logical conjunctions (i.e. the learning system will look for a concept description in the form of a conjunction); - linear threshold functions (for exemplar-based representations); - three-layer neural networks with a fixed number of hidden units.

Restricted hypothesis space bias: example The language of instances consists of triples of bits as, for example: (0, 1, 1), (1, 0, 1). How many concepts are in this space? The total number of subsets of instances is 28 = 256. The language of generalizations consists of triples of 0, 1, and *, where * means any bit, for example: (0, *, 1), (*, 0, 1). How many concepts could be represented in this language? This hypothesis space consists of 3x3x3 = 27 elements.

Preference bias A preference bias places a preference ordering over the hypotheses in the hypothesis space H. The learning algorithm can then choose the most preferred hypothesis f in H that is consistent with the training examples, and produce this hypothesis as its output. Most preference biases attempt to minimize some measure of syntactic complexity of the hypothesis representation (e.g. shortest logical expression, smallest decision tree). These are variants of Occam's Razor, which is the bias first defined by William of Occam (1300-1349): Given two explanations of data, all other things being equal, the simpler explanation is preferable.

Preference bias: representation How could the preference bias be represented? In general, the preference bias may be implemented as an order relationship 'better(f1, f2)' over the hypothesis space H. Then, the system will choose the "best" hypothesis f, according to the "better" relationship. An example of such a relationship: "less-general-than" which produces the least general expression consistent with the data.

Problem Language of instances: An instance is defined by triplet of the form (specific-color, specific-shape, specific-size) Language of generalization: (color-concept, shape-concept, size-concept) Set of examples: color shape size class orange square large + i1 blue ellipse small - i2 red triangle small + i3 green rectangle small - i4 yellow circle large + i5 Background knowledge: Task: Apply the candidate elimination algorithm to learn the concept represented by the above examples.

Solution: +i1: (color = orange) & (shape = square) & (size = large) S: {[(color = orange) & (shape = square) & (size = large)]} G: {[(color = any-color) & (shape = any-shape) & (size = any-size)]} -i2: (color = blue) & (shape = ellipse) & (size = small) S: {[(color = orange) & (shape = square) & (size = large)]} G: {[(color = warm-color) & (shape = any-shape) & (size = any-size)], [(color = any-color) & (shape = polygon) & (size = any-size)], [(color = any-color) & (shape = any-shape) & (size = large)]} +i3: (color = red) & (shape = triangle) & (size = small) S: {[(color = warm-color) & (shape = polygon) & (size = any-size)]} G: {[(color = warm-color) & (shape = any-shape) & (any-size)], [(color = any-color) & (shape = polygon) & (size = any-size)]} -i4: (color = green) & (shape = rectangle) & (size = small) S: {[(color = warm-color) & (shape = polygon) & (size = any-size)]} G: {[(color = warm-color) & (shape = any-shape) & (size = any-size)} +i5: (color = yellow) & (shape = circle) & (size = large) S: {[(color = warm-color) & (shape = any-shape) & (size = any-size)]} G: {[(color = warm-color) & (shape = any-shape) & (size = any-size)]} The concept is: (color = warm-color) & (shape = any-shape) & (size = any-size) ; a warm color object

Does the order of the examples count? Why and how? Consider the following order: color shape size class orange square large + i1 red triangle small + i3 yellow circle large + i5 blue ellipse small - i2 green rectangle small - i4

Learning Agents Laboratory Computer Science Department George Mason University