1 / 30

Inference in First-Order Logic with Additional Rules

Learn about the additional inference rules in First-Order Logic (FOL) such as Universal Elimination and Existential Elimination, and see examples of how they can be applied to derive new conclusions from premises.

fpickles
Download Presentation

Inference in First-Order Logic with Additional Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inference in FOL All PL inference rules hold in FOL, as well as the following additional ones: • Universal elimination rule: x, A ===> Subst({x / g}, A). Here x is a variable, A is a wff, g is a ground term, and Subst(, A) is a result of applying the substitution  to A. Example: Consider the formula  y likes(Bob, y). From here, we can infer by the UE rule likes(Bob, Tom) if {y / Tom}, likes(Bob, Bob} if {y / Bob}, likes(Bob, father(Bob) if {y / father(Bob)}. • Existential elimination rule:  x, A ===>Subst({x / k}, A). Here k is a constant term which does not appear elsewhere in the KB.Example: Consider the formula  z hates(y, z). Let foe be a new function constant. Then, we can infer by the EE rule hates(y, foe(y)). Here foe(y) designates a particular object hated by y. Note that in performing existential elimination, it is important to avoid objects and functions that have already been used, because otherwise we would be able to infer hates(Bob, Bob) from the fact  z hates(Bob, z) which is a much stronger, not intended claim.

  2. Inference in FOL (cont.) • Existential introduction rule: A ===>  x Subst({g / x}, A). Example: From hates(Bob, apple), we can infer by the EI rule  x hates(Bob, x). Given any set of inference rules, we say that conclusion is derivable from a set of premises  iff: •   , or •  is the result of applying a rule of inference to sentences derivable from . A derivation of  from  is a sequence of sentences in which each sentence is either a member of , or is a result of applying a rule of inference to sentences generated earlier in the sequence.

  3. Example We know that horses are faster than dogs, and that there is a greyhound that is faster than any rabbit. We also know that Harry is a horse, and Ralph is a rabbit. We want to derive the fact that Harry is faster than Ralph. Step 1: Build the initial knowledge base (i.e. the set ) 1.  x, y horse(x) & dog(y) => faster(x, y) 2.  y greyhound(y) & ( z rabbit(z) => faster(y, z)) 3.  y greyhound(y) => dog(y) Note that this piece of knowledge is not explicitly stated in the problem description. 4.  x, y, z faster(x, y) & faster(y, z) => faster(x, z) We must explicitly say that faster is a transitive relationship. 5. Horse(Harry) 6. Rabbit(Ralph)

  4. Example (cont.) Step 2: State the goal, i.e. the formula that must be derived from the initial set of premises 1 to 6. faster(Harry, Ralph) Step 3: Apply inference rules to  and all newly derived formulas, until the goal is reached. • From 2.) and the existential elimination rule, we get 7. greyhound(Greg) & ( z rabbit(z) => faster(Greg, z)) • From 7.) and the and-elimination rule, we get 8. greyhound(Greg) 9.  z rabbit(z) => faster(Greg, z) • From 9.) and the universal elimination rule, we get 10. rabbit(Ralph) => faster(Greg, Ralph) • From 10.), 6.) and MP, we get 11. Faster(Greg, Ralph)

  5. Example (cont.) • From 3.) and the universal elimination rule, we get 12. greyhound(Greg) => dog(Greg) • From 12.), 8.) and MP, we get 13. dog(Greg) • From 1.) and the universal elimination rule applied twice, we get 14. horse(Harry) & dog(Greg) => faster(Harry, Greg) • From 5.), 13.) and the and-introduction rule, we get 15. horse(Harry) & dog(Greg) • From 14.), 15.) and MP, we get 16. faster(Harry, Greg) • From 4.) and the universal elimination rule applied three times, we get 17. faster(Harry, Greg) & faster(Greg, Ralph) => faster(Harry, Ralph) • From 16.), 11.) and the and-introduction rule, we get 18. faster(Harry, Greg) & faster(Greg, Ralph) • From 17.), 18.) and MP, we get 19. faster(Harry, Ralph)

  6. Inference in FOL (cont.) There are two important conclusions that we can make from these examples: 1. The derivation process is completely mechanical: each conclusion follows from previous conclusions by a mechanical application of a rule of inference. 2. The derivation process can be viewed as a search process, where inference rules are the operators transforming one state of the search space into another state. We know that a search problem of this type has an exponential complexity (only the universal elimination operator has an enormous branching factor). Notice, however, that in the derivation process we have used 3 inference rules (universal elimination, and-introduction and MP) in combination. If we can substitute them with a single rule, then the derivation process will become much more efficient.

  7. Towards developing a more efficient inference procedure utilizing a single inference rule. Consider the following sentences: 1. missile(M1) 2.  y owns(y, M1) 3.  x missile(x) & owns(Nono, x) => sells(West, Nono, x). In order to infer sells(West, Nono, M1), we must first apply the universal elimination rule with substitution {y / Nono} to 2.) to infer owns(Nono, M1). Then, apply the same rule with substitution {x / M1} to 3.) to infer missile(M1) & owns(Nono, M1) => sells(West, Nono, M1). Then, by applying the and-introduction rule to missile(M1) and owns(Nono, M1), we get missile(M1) & owns(Nono, M1), which finally will let us derive sells(West, Nono, M1) by applying the MP rule. All this can be combined in a single inference rule, called the generalized modus ponens.

  8. Generalized Modus Ponens Given a set of n atomic sentences, p1’, p2’, …, pn’, and q, and one implication p1 & p2 & … & pn => q, the generalized MP rule states that: p1’, p2’, …, pn’, (p1 & p2 & … & pn => q) ===> Subst(, q) where Subst(, pi’) = Subst(, pi) for all i. The process of computing substitution  is called unification. In the “Colonel West” example, we have the following: p1’: missile(M1) p1: missile(x) p2’: owns(y, M1) p2: owns(Nono, x) : {x / M1, y / Nono} q: sells(West, Nono, x} Subst(, q): sells(West, Nono, M1)

  9. Canonical form of a set of FOL formulas Generalized MP (GMP) can only be applied to sets of Horn formulas. Recall that these are formulas of the form: A1 & A2 & ... & An ==> B, where A1, A2, ..., An, B are positive literals (a literal is a formula or its negation) To utilize the GMP we must first convert the initial knowledge base into a canonical / normal form, where all sentences are Horn formulas and then perform forward or backward reasoning. Note that there are different (equivalent) ways to represent the same logical formula and the canonical form establishes a standardized format for knowledge representation to simplify automated reasoning. There are different normal forms, but the conjunctive normal form (CNF) is the most commonly used. It is a conjunction of disjunctions of literals. Example. Consider the following set of formulas {P  Q, (S & T), R}. Its CNF is the following {P V Q, S V T, R}.

  10. Converting formulas into a Conjunctive Normal Norm Consider the following FOL formula stating that a brick is an object which is on another object which is not a pyramid, and there is nothing that a brick is on, and at the same time that object is on the brick, and there is nothing that is not a brick and also is the same thing as the brick.  x [brick(x) => ( y [on(x, y) & ¬pyramid(y) ] & & ¬ y [on(x, y) & on(y, x) ] & & y [¬brick(y) => ¬equal(x, y) ] ) ] To convert this formula into a normal form, we must go through the following steps: Step 1: Eliminate implications, i.e. substitute any formula A => B with ¬A v B.  x [¬ brick(x) v ( y [on(x, y) & ¬pyramid(y) ] & & ¬ y [on(x, y) & on(y, x) ] & &  y [¬ ( ¬brick(y)) v ¬equal(x, y) ] ) ]

  11. Converting formulas into a normal form (cont.) Step 2: Move negations down to atomic formulas. This step requires the following transformations: ¬ (A v B)  ¬A & ¬B ¬ (A & B)  ¬A v ¬B ¬ ¬A  A ¬  x A(x)   x (¬ A(x)) ¬  x A(x)   x (¬ A(x))  x [¬ brick(x) v ( y [on(x, y) & ¬pyramid(y) ] & &  y [¬ on(x, y) v ¬ on(y, x) ] & &  y [brick(y) v¬equal(x, y) ] ) ]

  12. Converting formulas into a normal form (cont.) Step 3: Eliminate existential quantifiers. Here, the only existential quantifier belongs to the sub-formula  y [on(x, y) & ¬pyramid(y) ]. This formula says, that given some x, we can find a function that takes x as an input, and returns y. Let us call this function support(x). Functions that eliminate the need for existential quantifiers are called Skolem functions.Here support(x) is a Skolem function. Note that the universal quantifiers determine the arguments of Skolem functions: there must be one argument for each universally quantified variable whose scope contains the Skolem function.  x [¬ brick(x) v ( on(x, support(x)) & ¬pyramid(support(x)) ) & &  y [¬ on(x, y) v ¬ on(y, x) ] & &  y [brick(y) v ¬equal(x, y) ] ) ]

  13. Converting formulas into a normal form (cont.) Step 4: Rename variables, if necessary, so that no two variables are the same.  x [¬ brick(x) v ( on(x, support(x)) & ¬pyramid(support(x)) ) & &  y [¬ on(x, y) v ¬ on(y, x) ] & &  z [brick(z) v ¬equal(x, z) ] ) ] Step 5: Move the universal quantifiers to the left.  x  y  z [¬ brick(x) v ( on(x, support(x)) & ¬pyramid(support(x)) ) & & [¬ on(x, y) v ¬ on(y, x) ] & & [brick(z) v ¬equal(x, z) ] ) ]

  14. Converting formulas into a normal form (cont.) Step 6: Move disjunctions down to literals. This requires the following transformation: A v (B & C & D)  (A v B) & (A v C) & (A v D)  x  y  z [ ( ¬ brick(x) v on(x, support(x))) & & ( ¬ brick(x) v ¬pyramid(support(x)) ) & & [¬ brick(x) v ¬ on(x, y) v ¬ on(y, x) ] & & [¬ brick(x) v brick(z) v ¬equal(x, z) ] ] Step 7: Eliminate the conjunctions, i.e. each part of the conjunction becomes a separate axiom.  x ( ¬ brick(x) v on(x, support(x)))  x( ¬ brick(x) v ¬pyramid(support(x)) )  x  y [¬ brick(x) v ¬ on(x, y) v ¬ on(y, x) ]  x  z [¬ brick(x) v brick(z) v ¬equal(x, z) ]

  15. Converting formulas into a normal form (cont.) Step 8: Rename all the variables so that no two variables are the same.  x ( ¬ brick(x) v on(x, support(x)))  w( ¬ brick(w) v ¬pyramid(support(w)) )  u  y [¬ brick(u) v ¬ on(u, y) v ¬ on(y, u) ]  v  z [¬ brick(v) v brick(z) v ¬equal(v, z) ] ) ] Step 9: Eliminate the universal quantifiers . Note that this is possible because all variables now are universally quantified, therefore the quantifiers can be ignored. ¬ brick(x) v on(x, support(x)) ¬ brick(w) v ¬pyramid(support(w)) ¬ brick(u) v ¬ on(u, y) v ¬ on(y, u) ¬ brick(v) v brick(z) v ¬equal(v, z)

  16. Converting formulas into a normal form (cont.) Step 10 (optional): Convert disjunctions back to implications if you use the “implication” form of the resolution rule. This requires the following transformation: (¬A v ¬B v C v D)  ((A & B) => (C v D)) brick(x) => on(x, support(x)) brick(w) & pyramid(support(w)) => False brick(u) & on(u, y) & on(y, u) => False brick(v) & equal(v, z) => brick(z)

  17. Converting formulas into a normal form: summary The procedure for translating FOL sentences into a normal form is carried out as follows: • Eliminate all of the implications. • Move the negation down to the atomic formulas. • Eliminate all existential quantifiers. • Rename variables • Move the universal quantifiers to the left. • Move the disjunctions down to literals • Eliminate the conjunctions • Rename the variables • Eliminate the universal quantifiers. • (Optional) Convert disjunctions back to implications.

  18. Completeness of the chaining algorithms Consider the following set of sentences:  x hungry(x) => likes(x, apple)  x ¬hungry(x) => likes(x, grapes)  x likes(x,apple) => likes(x, fruits)  x likes(x,grapes) => likes(x, fruits) Assume that we want to prove likes(Bob, fruits). Obviously, this is true if likes(Bob, apple) v likes(Bob, grapes) is true, which is always true, because the first disjunct depends on hungry(Bob) and the second disjunct depends on ¬hungry(Bob). None of the chaining algorithms, however, will allow us to infer likes(Bob, fruits). The reason is that  x ¬hungry(x) => likes(x, grapes) is not a Horn formula. Chaining algorithms, which use the generalized MP as the only inference rule, are incomplete for non-Horn KBs.

  19. Why do we need a stronger inference rule? Consider the following example: Bob wants to take CS 501 next semester. This class will meet either MW 6:45 -- 8:00, or TR 6:45 -- 8:00. Bob has to be at his soccer sessions MTF 5:30 -- 8:30. Can he take CS 501? Initial KB: MW(CS501, 645--800) v TR(CS501, 645--800) MW(CS501, 645--800) & Busy(Bob, M, 530--830) => nogood-class(Bob) TR(CS501, 645--800) & Busy(Bob, T, 530--830) => nogood-class(Bob) Busy(Bob, M, 530--830) Busy(Bob, T, 530--830) Possible inferences: MW(CS501, 645--800) => nogood-class(Bob) …. (A) TR(CS501, 645--800) => nogood-class(Bob) ….. (B)

  20. The resolution rule revisited (See section 9.5) We can draw more inferences if we look at MW(CS501, 645--800) v TR(CS501, 645--800) as describing two different cases: • Case 1: MW(CS501, 645--800) is true, in which case nogood-class(Bob) is true by means of (A). • Case 2: TR(CS501, 645--800) is true, in which case nogood-class(Bob) is true by means of (B). The answer to the initial query is derived no matter which is the right case. This type of reasoning is called case analysis, and it can be carried out by means of the resolution rule as follows: ¬MW(CS501, 645--800) v nogood-class(Bob) MW(CS501, 645--800) v TR(CS501, 645--800) nogood-class(Bob) v TR(CS501, 645--800) ¬TR(CS501, 645--800) v nogood-class(Bob) nogood-class(Bob) v nogood-class(Bob)  nogood-class(Bob)

  21. The resolution rule contd. Recall the resolution rule for propositional logic: (A v B, ¬B v C  A v C)  (¬A => B, B => C  ¬A => C) There are two different ways to interpret this rule: • As describing 2 cases, namely • B is true, ¬B is false, in which case C is true. • ¬B is true, B is false, in which case A is true. In both cases, A v C is true. • Because the implication operation is transitive, the resolution rule let us link the premise of one implication with the conclusion of the second implication, ignoring intermediate sentence B, i.e. let us derive a new implication. None of these can be done with MP, because MP derives only atomic conclusions.

  22. Consider the following propositional version of generalized MP: A1 & A2 & … & Am => B D1 & D2 & … & Dn => C From these two formulas, we can infer the following one making use of the monotonicity of the PL: A1 & A2 & … & Am & D1 & D2 & … & Dn => B Assume now that B = B1 v B2 v … v Bk, and C = Ai A1 & A2 & … & Am => B1 v B2 v … v Bk D1 & D2 & … & Dn => C A1 & A2 & …& A(i-1) & D1 & D2 & … & Dn & A(i+1) & … & Am => B1 v B2 v … v Bk Consider the following cases: • If A’s hold, then at least one B holds. • If m = 0, then our formula degenerates to a form B1 v B2 v … v Bk • If k = 1, then our formula has the form A1 & A2 & … & Am => B1

  23. If k = 0, then A1 & A2 & … & Am => False, which is equivalent to ¬(A1&A2&…&Am), which is in turn equivalent to ¬A1 v … v ¬Am. If, at the same time, m = 1, then we can represent negated formulas such as ¬student(Bob), or in its equivalent form, student(Bob) => False. If m = 0, then we have True => False, which represents a contradiction. That is, formulas of the form A1 & A2 & … & Am => B1 v … v Bk are general enough to represent any logical formula. If a KB is comprised of only formulas of this type, we say that it is in a normal form. Therefore, we need a generalized version of the resolution rule to work with such KBs.

  24. The generalized resolution rule: definition Generalized resolution is the following rule of inference: A1 & A2 & … & Am => B1 v B2 v … v Bk D1 & D2 & … & Dn => C1 v C2 v … v Cx A1 & … & Ai & … & Am & D1 & … & Dn => B1 v … v Bk v C1 v… v Cj v… v Cx Here Ai = Cj, which is why we can ignore them from the l.h.s. and r.h.s. of the implication, respectively. Here is the alternative version of generalized resolution: A1 v A2 v … v Ai v … v Ap C1 v C2 v … v Cj v … v Ck A1 v A2 v … v Ai v… v Ap v C1 v… v Cj v… v Cx Here Ai = ¬Cj, which is why we can ignore them both.

  25. Proving formulas by the resolution rule If we have a FOL KB, then Ai and Cj (in the first case, or ¬Cj in the second case), will be the same if there is a substitution  such that subst(, Ai) = subst(, Cj) (or subst(, Ai) = subst(, ¬Cj), respectively). Assuming that all formulas in the KB are in a normal form, we can apply the resolution rule in forward or backward chaining algorithms. Example: Consider the following KB ¬P(w) v Q(w), P(x) v R(x), ¬Q(y) v S(y), ¬R(z) v S(z). Using forward chaining, the following conclusions can be derived: ¬P(w) v Q(w) ¬Q(y) v S(y) {y / w} These are ¬P(w) v S(w) P(x) v R(x) called {w / x} resolvents. S(x) v R(x) ¬R(z) v S(z). {x /A, z /A} S(A) v S(A)  S(A)

  26. Completeness of the chaining process with the resolution rule Chaining with the resolution rule is still an incomplete inference procedure. To see why, consider an empty KB from which you want to derive P v ¬P. Note that this is a valid formula, therefore it follows from any KB including the empty KB. However, using only the resolution rule, we cannot prove it. Assume that we add ¬(P v ¬P)  (¬P & P)  (¬P, P). Adding a negation of a valid formula to the exiting KB introduces a contradiction in that KB. If we can prove that KB & ¬A => False, where KB |= A, then we prove that KB |-- A. In the example above { P , ¬P} => Nil, which proves P v ¬P.

  27. The refutation method The inference procedure that proves a formula by showing that its negation, if added to the KB, leads to a contradiction is called refutation. The following procedure implements the refutation method: • Negate the theorem to be proved, and add the result to the set of axioms. • Put a list of axioms into a normal form • Until there is no resolvable pair of clauses do: • Find resolvable clauses, and resolve them. • Add the result of the resolution to the list of clauses. • If Nil is produced, stop (the theorem has been proved by refutation). • Stop (the theorem is false). The refutation method is a complete inference procedure.

  28. Example Consider the following set of axioms: ¬hungry(w) v likes(w, apple) hungry(x) v likes(x, grapes) ¬likes(y,apple) v likes(y, fruits) ¬likes(z,grapes) v likes(z, fruits) Assume that we want to prove likes(Bob, fruits). Therefore, we must add ¬likes(Bob, fruits) to the set of axioms. ¬hungry(w) v likes(w, apple) ¬likes(y,apple) v likes(y, fruits) {y / w} ¬hungry(w) v likes(w, fruits) hungry(x) v likes(x, grapes) {w / x} likes(x, fruits) v likes(x, grapes) ¬likes(z,grapes) v likes(z, fruits) {z / x} likes(x, fruits) ¬likes(Bob, fruits) {x / Bob} Nil

  29. Problems with the resolution rule The resolution proof is exponential, which is why we must use some strategies to direct it. The following two ideas can help here: • Every resolution involves the negated theorem or a derived clause which has used the negated theorem directly or indirectly. • Always remember what your goal is, so that given what you currently have, you can find the difference, and using your intuition try to reduce this difference to get closer to the goal. This ideas can be implemented as resolution strategies working at a meta- logical level. Among the most popular strategies are the following: • Unit preference. Always prefer a single literal when doing resolution. This strategy was found efficient only for relatively small problems. • Set of support. The current resolution involves the negated theorem or the new clauses directly or indirectly involving it. The set of such clauses plus the theorem are called the support set. Initially, the support set involves only the negated theorem.

  30. Resolution strategies (cont.) 3. The breadth-first strategy. First resolve all possible pairs of initial clauses, then resolve all possible pairs of the resulting set together with the initial set, and so on. • . Input resolution. Every resolution uses one of the initial clauses or the negated theorem. If we allow resolutions to use also clauses where one clause is an ancestor or another clause, we have the strategy called linear resolution. • . Subsumption. Eliminate all sentences subsumed by other sentences. For example, if P(x)  KB, then P(A) will not be added if inferred, because it is subsumed by P(x). Recall that in order to apply the refutation method, we must first convert the initial set of sentences into a normal form.

More Related