IT/CS 811 Principles of Machine Learning and Inference

IT/CS 811 Principles of Machine Learning and Inference Learning by analogy Prof. Gheorghe Tecuci Learning Agents Laboratory Computer Science Department George Mason University

Overview Learning by analogy: definition Design issues The structure mapping theory Determinations Problem solving by analogy Exercises Recommended reading

Learning by analogy: definition Learning by analogy means acquiring new knowledge about an input entity by transferring it from a known similar entity. One may infer, by analogy, that hydraulics laws are similar to Kirchoff's laws, and Ohm's law. Which is the central intuition supporting the learning by analogy paradigm?

Discussion Central intuition supporting learning by analogy: If two entities are similar in some respects then they could be similar in other respects as well. Examples of analogies: Pressure Drop is like Voltage Drop A variable in a programming language is like a box. Provide other examples of analogies.

Learning by analogy: illustration Illustration: The hydrogen atom is like our solar system. The Sun has a greater mass than the Earth and attracts it, causing the Earth to revolve around the Sun. The nucleus also has a greater mass then the electron and attracts it.Therefore it is plausible that the electron also revolves around the nucleus.

Learning by analogy: the learning problem Given: • A partially known target entity T and a goal concerning it. • Background knowledge containing known entities. Find: • New knowledge about T obtained from a source entity S belonging to the background knowledge. Partially understood structure of the hydrogen atom under study. Knowledge from different domains, including astronomy, geography, etc. In a hydrogen atom the electron revolves around the nucleus, in a similar way in which a planet revolves around the sun.

Learning by analogy: the learning method • ACCESS: find a known entity that is analogous with the input entity. • MATCHING: match the two entities and hypothesize knowledge. • EVALUATION: test the hypotheses. • LEARNING: store or generalize the new knowledge. In the Rutherford’s analogy the access is no longer necessary because the source entity is already given (the solar system). One may map the nucleus to the sun and the electron to the planet, allowing one to infer that the electron revolves around the nucleus because the nucleus attracts the electron and the mass of the nucleus is greater than the mass of the electron. A specially designed experiment shows that indeed the electron revolves around the nucleus. Store that, in a hydrogen atom, the electron revolves around the nucleus. By generalization from the solar system and the hydrogen atom, learn the abstract concept that a central force can cause revolution.

Discussion How does analogy help? Why not just study the structure of the hydrogen atom to discover that new knowledge? We anyway need to perform an experiment to test that the electron revolves around the hydrogen atom.

Learning by analogy: Design issues • ACCESS: find a known entity that is analogous with the input entity. • MATCHING: match the two entities and hypothesize knowledge. • EVALUATION: test the hypotheses. • LEARNING: store or generalize the new knowledge. Given a target, how to identify a few potential sources in a very large storage? Given a potential source, how to identify the knowledge to hypothesize? Why and how to test the hypothesized knowledge? How to learn?

Learning by analogy: Formalization Given: - a target entity T; - a universe of potential sources U; - an access function f1 with a threshold value f1; - a matching function f2 with a threshold value f2. Find: - new knowledge about T (using analogical learning).

Learning by analogy: Access Find potential sources for T in U : f1(Sk, T) > f1 This should result in S1, … , Sn

Learning by analogy: Matching Find the best match between one of S1, …, Sn and T. Let: Sk = A & B & C, T = A' & D where f2(Sk, T) > f2 gives the best match. A and A' are the parts of Sk and T that make them analogous: f2(Sk, T) = f2(A, A') B, C and D are the other parts of Sk and T. As a side effect of partially matching Sk with T (or totally matching A with A'), one obtains a correspondence (substitution) list s = ( o1 ¬ o1', ... , on ¬ on') where oi is an element of A and oi' is the corresponding element from A'. By applying the substitution s to Sk one obtains: s(Sk) = s(A) & s(B) & s(C) = A' & s(B) & s(C) = A' & B' & C'. By analogy with Sk one concludes that T might also have the features B' & C'.

Learning by analogy: Evaluation and learning • By analogy with Sk one concludes that T might also have the features B' & C'. • However, the evaluation phase shows that T has the features B' but it does not have the features C'. • Therefore: • B represents the part of Sk that is transferred to T because of the similarity between A and A‘; • C is the part of Sk that is not transferred to T; • - D represents the features that are specific to T.

Case study discussion: Rutherford’s analogy "The hydrogen atom is like our solar system". In this case, the fact that S and T are analogous is already known. Therefore, the access part is solved and the only purpose of the matching function remains that of identifying the correct correspondence between the elements of the solar system and those of the hydrogen atom. This is an example of a special (simpler form of analogy): “A T is like an S.” This is useful mostly in teaching based on analogy.

Case study discussion: potential matchings Which are the possible matchings between the elements of S and the elements of T? yellow yellow color color sun sun nucleus nucleus mass mass temperature temperature mass mass Msun Msun Mnucleus Mnucleus Tsun Tsun attracts attracts attracts attracts greater greater greater greater causes causes greater greater revolves revolves - - around around Tplanet Tplanet Mplanet Mplanet Melectron Melectron mass mass temperature temperature planet planet mass mass electron electron

Case study discussion: potential matchings There are several possible matchings between the elements of S and the elements of T and one has to select the best one: Matching1: sun « nucleus, planet « electron, Msun « Mnucleus, Mplanet « Melectron, which is supported by the following correspondences mass(sun, Msun) « mass(nucleus, Mnucleus) mass(planet , Mplanet ) « mass(electron, Melectron) greater(Msun, Mplanet) « greater(Mnucleus, Melectron), attracts(sun, planet) « attracts(nucleus, electron) Matching2: sun « nucleus, planet « electron, Tsun « Mnucleus, Tplanet « Melectron, that is supported by the following correspondences greater(Tsun, Tplanet) « greater(Mnucleus, Melectron), attracts(sun, planet) « attracts(nucleus, electron) Matching3: sun « electron, planet « nucleus, Msun « Melectron, Mplanet « Mnucleus

Similarity estimation issues and sample solutions 1. How to search the space of all possible matchings ? Exhaustive search. Other solutions? 2. How to measure the similarity of two elements ? Two elements are similar if they represent the same concept or are subconcepts of the same concepts. In such a case their similarity may be considered 1 (on a 0-1 scale). Other solutions? 3. How to combine the estimated similarities of the parts in order to obtain the similarity between S and T ? The similarity of two entities is the sum of the similarity of their elements. Other solutions? 4. How to define the similarity threshold ? Similarity threshold defined by the designer (a hard critical issue). Other solutions?

Case study discussion: Matching result The best matching is Matching1 (because it leads to the highest number of common features of the solar system and the hydrogen atom) that gives the following substitution: s = (sun ¬ nucleus, planet ¬ electron, Msun ¬ Mnucleus, Mplanet ¬ Melectron) By applying the substitution to the solar system, one obtains the following structure: yellow yellow color nucleus The features in light color are those that could be transferred to the hydrogen atom as a result of the analogy with the solar system: • color(nucleus, yellow) • temperature(nucleus, Tn) • temperature(electron, Te) • greater(Tn, Te) • revolves-around(nucleus, electron) • causes( (attracts(nucleus,electron), greater(Mnucleus, Melectron)), revolves-around(nucleus, electron)) mass temperature Mnucleus Tnucleus Tsun attracts greater greater causes revolves - - around Telectron Melectron Tplanet mass temperature temperature electron electron

Case study discussion: Evaluation The evaluating phase shows that The hydrogen atom has the features: • revolves-around(nucleus, electron) • causes((attracts(nucleus,electron), greater(Mnucleus, Melectron)), revolves-around(nucleus, electron)) The hydrogen atom does not have the features: • color(nucleus, yellow) • temperature(nucleus, Tn) • temperature(electron, En) • greater(Tn, En) Which is, in your opinion, the most critical issue in analogical learning?

Discussion Which is the most critical issue in analogical learning? What kind of features may be transferred from the source to the target so as to make sound analogical inferences ?

Case study discussion: transfer of causal relation

Case study discussion: Learning Store the new acquired knowledge about the hydrogen atom: • revolves-around(nucleus, electron) • causes(attracts(nucleus,electron), greater(Mnucleus, Melectron)), revolves-around(nucleus, electron)) By generalization from the solar system and the hydrogen atom one may learn the abstract concept that a central force can cause revolution: • causes(attracts(x, y), greater(Mx, My)), revolves-around(x, y)) Question: When to store the acquired knowledge and when to generalize it?

Analogy in Disciple Analogy criterion multi_member_force force instance_of instance_of has_as_member ?O1 ?O2 less general than less general than explanation similar explanation similar has_as_member has_as_member Allied_Forces_1943 US_1943 European_Axis_1943 Germany_1943 explains? explains similar example initial example I need to I need to Identify and test a strategic COG candidate corresponding to a member of a force The force is Allied_Forces_1943 Identify and test a strategic COG candidate corresponding to a member of a force The force is European_Axis_1943 Therefore I need to Therefore I need to similar Identify and test a strategic COG candidate for a force The force is US_1943 Identify and test a strategic COG candidate for a force The force is Germany_1943

Causal networks of relations An important result of the learning by analogy research is that the analogy involves mapping some underlying causal network of relations between analogous situations. By causal network of relations it is generally meant a set of relations related by special higher order relations such as 'physical-cause(ri, rj)', 'logically-implies(ri, rj)', 'enables(ri, rj)', 'justifies(ri, rj)', determines(ri, rj) etc. The idea is that similar causes are expected to have similar effects: The basic scheme of analogy

Gentner’s structure mapping theory The main claim of this theory is that relations between objects, rather than attributes of objects, are mapped from source to target. Moreover, a relation that belongs to a mappable system of mutually interconnecting relationships is more likely to be imported into the target than is an isolated relation (the systematicity principle). See: Gentner D., The mechanisms of analogical reasoning, in J.W.Shavlik, T.G.Dietterich (eds), Readings in Machine Learning, Morgan Kaufmann, 1990.

Gentner’s structure mapping theory (cont.) Analogy maps the objects of the source onto the objects of the target: s1 « t1, ... , sn « tn These object correspondences are used to generate the candidate set of inferences in the target domain. Predicates from the source are carried across to the target, using the node substitutions dictated by the object correspondences, according to the following rules: 1. Discard attributes of objects A(si) -/-> A(ti) For instance, the yellow color of the sun is not transferred to the hydrogen nucleus. 2. Try to preserve relations between objects R(si, sj) -?-> R(ti, tj) That is, some relations are transferred to the target, while others are not. 3. The systematicity principle:the relations that are most likely to be transferred are those belonging to systems of interconnected relations R'(R1(si,sj), R2(sk,sl)) ® R'(R1(ti,tj), R2(tk,tl))

Literal similarity, analogy, and abstraction Gentner's theory distinguishes between literal similarity, analogy, and abstraction. One says that a target T is literally similar with a source S if and only if a large number of predicates is mapped from source to target, relative to the number of nonmapped predicates and, also, the mapped predicates include both attributes of objects and relations between objects. For instance, 'kool-aid' is literally similar with 'water' since it has most of the features of 'water' (both attributes of objects and relations between objects). Give other examples of literally similar entities.

Literal similarity, analogy, and abstraction One says that a target T is analogous with a source S if and only if relations between objects, but few or no attributes of objects, can be mapped from source to target. For instance, 'heat' is analogous to 'water'. One says that a source S is an abstraction of a target T if and only if the source is an abstract relational structure and each predicate (a relation between objects or an attribute of an object) from the abstract source is mapped into a less abstract predicate of the target; there are no nonmapped predicates. For instance, 'through-variable' is an abstraction of 'heat', where by 'through-variable' we mean something that flows across a difference in potential. Give other examples of abstractions.

Similarity, analogy, and abstraction: discussion Given that two entities overlap in relations, they are more literally similar to the extent that their object attributes also overlap. Therefore, literal similarity might be seen as a particular case of analogy. Abstraction may also be seen as a special case of analogy in which all the predicates of the source entity are mapped into the target entity. What could we conclude from these observations?

Similarity, analogy, and abstraction: discussion What could we conclude from these observations? Overlap in relations is necessary for any perception of similarity, analogy or abstraction. The contrast between literal similarity, analogy, and abstraction is a continuum.

Gentner’s theory: implementation and discussion An implementation of the Structure-Mapping theory is the Structure-Mapping Engine (Falkenhainer, Forbus & Gentner, 1989: The Structure-mapping Engine. Algorithms and Examples, Artificial Intelligence, 41:1-63. Also in Readings in Knowledge Acquisition and Learning). Given the descriptions of a source and a target, the Structure-Mapping Engine constructs all syntactically consistent analogical mappings between them. Each mapping consists of pairwise matches between predicates and objects in the source and target, plus a list of predicates which exist in the source but not the target. This list of predicates is the set of candidate inferences sanctioned by the analogy. The Structure-Mapping Engine evaluates syntactically each possible analogy to find the best one.

Gentner’s theory: implementation and discussion The Structure-Mapping Engine needs to be given the descriptions of a source and a target. This requires the ACCESS problem to be solved first: How do we find potential sources for a target? MAC/FAC (Forbus, Gentner, Law, 1995: “MAC/FAC: A model of similarity-based retrieval,” Cognitive Science, 19(2):141-205) is a system that addresses the access problem. The MAC stage uses a simple, nonstructural matcher to filter our a few promising candidates from a large memory of structured descriptions. The FAC stage evaluates each candidate using SME to provide a structural match. MAC/FAC was scaled-up in the DARPA’s HPKB and RKF programs. What is, however, a problem with Gentner’s theory?

Gentner’s theory: discussion Could you think of a different representation where the following expression is no longer a second order relation? causes( (attracts(nucleus,electron), greater(Mnucleus, Melectron)), revolves-around(nucleus, electron)) What is a problem with Gentner’s theory? Gentner’s interpretation rules depend only on the syntactic properties of the knowledge representation, and not on the specific content of the domain. Why is this a problem? Consider these equivalent representations: Book1-on-Table On(Book1, Table)

Determinations: Definition Instead of giving a general criterion for the validity of analogical knowledge transfer (high order relations or causal network of relations), Russel and Davis propose to specify explicitly what knowledge can be transferred. The rules for specifying this are called "determination rules". P(x, y) >- Q(x, z) (P plausibly determines Q) meaning "S, "T { If $y [P(S, y) & P(T, y)] then it is probably true that $z [Q(S, z) & Q(T, z)] } where P and Q are first order logical expressions.

Determinations: Definition (cont.) A determination rule is an expression of the following form: U(x1,...,xn,y1,...,ym) >- V(x1,...,xn,z1,...,zp) One says that U determines V. That is, whenever the arguments of U have certain values, the arguments of V are very likely to have corresponding values. Example: Rainfall(x, y) >- Water-in-soil(x, z) Rainfall(Philippine, heavy), Water-in-soil(Philippine, high)

Analogical reasoning based on determinations Given: Rainfall(x, y) >- Water-in-soil(x, z) Rainfall(Philippine, heavy), Water-in-soil(Philippine, high) Rainfall(Vietnam, heavy) Conclude: Water-in-soil(Vietnam, high) What is the difference between a determination rule and a deductive rule?

Determinations: Discussion A determination rule is different from a deductive rule. The form of a deductive rule is: U(x1,...,xn,y1,...,ym) --> V(x1,...,xn,y1,...,ym) That is, the variables which appear in the left hand side of a rule also appear in the right hand side. Therefore, if we know that 'U(a1,...,an,b1,...,bm)' is true, we could apply modus ponens to infer that 'V(a1,...,an,b1,...,bm)' is also true. This type of reasoning is not possible in the case of a determination U(x1,...,xn,y1,...,ym) >- V(x1,...,xn,z1,...,zp) because we do not know the values of the variables z1,...,zp. In order to apply a determination rule, one would need a source entity, as will be illustrated in the following.

Analogy based on determinations: Method The basic procedure for answering the query V(T, ?z) by analogy: 1. Find a determination such that U(?x, ?y) >- V(?x, ?z) (i.e. decide which determinations could be relevant for T: U(T, ?y) >- V(T, ?z)) 2. Find 'a' such that U(T, a) (i.e. find how the facts are instantiated in the target) 3. Find a source S such that U(S, a) (i.e. find a suitable source) 4. Find 'b' such that V(S, b) (i.e. find the answer to the query from the source: U(S, a) >- V(S, b)) 5. Return 'b' as the solution to the query (U(T, a) >- V(T, b)) U(S,a) U(T,a) b V(S,b) V(T,?z)

Analogy based on determinations: Illustration Let us consider the following target Nationality (Jack, UK), Male(Jack), Height(Jack, 6'), ... and the problem of answering the following question by analogy What is the native language of Jack ? (i.e. Native-language(Jack, ?z)) 1. Find a determination such that U(x, y) >- Native-language(x, z) Such a determination is: Nationality (x, y) >- Native-language(x, z) 2. Find 'a' such that Nationality (Jack, a) Nationality (Jack, UK) a = UK 3. Find a source S such that Nationality (S, UK) Nationality (Jill, UK), Female(Jill) , Height(Jill, 5'10"), Native-Language(Jill, English) S = Jill 4. Find 'b' in S such that NativeLanguage(Jill, b) Native-Language(Jill, English) b = English 5. Return 'English' as the solution to the query Native-language(Jack, English)

Determinations: Discussion Consider the determination rule: U(x1,...,xn,y1,...,ym) >- V(x1,...,xn,z1,...,zp) Should U and V be terms or could they be arbitrary logical expressions? Why? What if we cannot find a source S for applying the determination?

Determinations: Discussion • U and V may be an logical expressions. • Example: • The rainfall of a flat area determines the quantity of water in the soil of the area • Rainfall(x, y) & Terrain(x, flat) --> Water-in-soil(x, z) • Rainfall(Philippines, heavy), Terrain(Philippines, flat), Water-supply(Philippines, high) • Rainfall(Vietnam, heavy), Terrain(Vietnam, flat) • Water-in-soil(Vietnam, ?t)

Determinations: Discussion What if we cannot find a source S for applying the determination? Sometimes there is no source S such that U(S, a) is true, but one may find S' such that U(S', a') is true. In such a situation one needs a way to decide whether a' and a are similar enough to infer V(T, b). Therefore, even in the case of determinations one may need a matching function. Example: Latitude of an area determines the climate of the area Latitude(x, y) --> Climate(x, z) Latitude(Romania, 45°), Climate(Romania, temperate) Latitude(France, 47°)

Problem solving by analogy Analogy means deriving new knowledge about an input entity by transferring it from a known similar entity. How could we define problem solving by analogy?

Problem solving by analogy: definition Problem solving by analogy is the process of transferring knowledge from past problem-solving episodes to new problems that share significant aspects with corresponding past experience and using the transferred knowledge to construct solutions to the new problems. What could be the overall structure of a problem solving by analogy method?

The problem solving by analogy method Let P be a problem to solve. First, look into the knowledge base for a previous problem solving episode which shares significant aspects with the problem to solve. Next transform the past episode to obtain a solution to the current problem. What questions need to be answered to develop such a method? What it means for problems to share significant aspects? How is the past problem solving episode transformed so as to obtain the solution to the current problem?

The derivational analogy method (Carbonell) • Two problems share significant aspects if they match within a certain threshold, according to a given similarity metric. • The solution to the retrieved problem is perturbed incrementally until it satisfies the requirements of the new problem.

IT/CS 811 Principles of Machine Learning and Inference