310 likes | 388 Views
COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION. The LR(0) algorithm for creating compilers is one in which contexts are not evaluated, and states are considered identical if they consist of the same set of marked productions.
E N D
COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION
The LR(0) algorithm for creating compilers is one in which contexts are not evaluated, and states are considered identical if they consistof the same set of marked productions
But this algorithm is insufficient for actual programming languages, producing parsers with numerous conflicts
The LR(1) algorithm when applied to creating compilers for real computer languages, such as those for Java or C++, results in a parsing machine that is a order or more larger than those produced by an LR(0) algorithm for the same grammar.
On the other hand the LR(1) algorithm, which you made use of in your last assignment, produces parsers, for the large grammars employed for actual computer languages, which are a few orders larger than those produced by the LR(0) algorithm.
As a compromise, various methods, including the one employed by Yacc, have been devised for subsets of the LR(1) languages, using a hybrid approach.
This works well for most programming languages, but imposes a greater responsibility on the compiler writer, to come up with a grammar that does not lead to conflicts (i.e. to cases where more than one action is defined at a parsing machine state for the same next input symbol). These methods only work for a subset of the LR(1) grammars, and there are applications, including ones involving natural language processing, for which they are inadequate.
However one can employ a definition of compatibility between states, which works for all LR(1) languages, and which produces parsers of the same size as those referred to previously
DEFINITION. The nucleus of state consists of the configurations in the state in which the marker is in a position greater that zero. Example A configuration in a state of the form A → bc.d, {x,y} would be a member of its nucleus, but a configuration such as A → .bcd, {x,y} would not be a member.
DEFINITION OF COMPATIBILITY BETWEEN LR(1) STATES Let S and S be two states in a LR(1) parsing machine whose nuclei consist of the same marked productions, which we will denote as P1,…,Pn . For 1≤ t ≤ n, let Ut denote the set of contexts associated with marked production Pt in state S, and let Ut denote the set of contexts associated with that marked production in state S. Then states S and S are compatible if, for all 1 ≤ i < j ≤ n, at least one of the following condition holds: (a) Ui Uj = and Ui Uj = ( is the empty set, i.e. the intersections involved are both empty) (b) Ui Uj ≠ (c) Ui Uj ≠
Note If states S and S are as described above, and their nuclei consist of only a single configuration, then according to the above definition they are compatible
In the case where S and S as described above are compatible, one can combine the states into a single state whose nucleus consists of the same marked productions listed above, while for 1≤ t ≤ n, the set of contexts associated with marked production Pt is Ut Ut .
One way of looking at the definition is to say thatevery pair of configurations in the nuclei must pass a test, and that two states arecompatible only if they all in fact pass.
Fortunately, in grammars for actual programming languages such as Java, C++, etc., there are at most 6 configurations in the nucleus of any state. The states may be large, with many immediate successors, but the nuclei are all quite small.
EXAMPLES We show only the nucleus of the states in these examples, since, according to the definition, states are compatible if and only if their nuclei are.
A → ab.c {x,y} B → b.n {s,t} C → rb.ed {u,v} A → ab.c {d} B → b.n {s} C → rb.ed {x,v} S S’ The above two states are not compatible because the pair consisting of the first and last configurations fail the test. For this pair condition (a) of the defn. is not true, since the context of the first configuration of S contains an x, and so does the context of the third production of S’ In addition neither of conditions (b) or (c) are true.
A → ab.c {x,y} B → b.n {s,t} C → rb.ed {x,v} A → ab.c {x,y,d} B → b.n {s} C → rb.ed {x,v} S S’ The first and third configurations in this case pass the test because condition (b) of the defn. applies to the first and third configurations of S. Both of these configurations contain x in their set of contexts. The states in this case are compatible. Remember, that while every pair of configurations in the nucleus must pass the test, it only requires that one of conditions (a), (b) or (c) be true for a given pair for it to pass.
Since the states are compatible, they can be combined to form one whose nucleus is: A → ab.c {x,y,d} B → b.n {s,t} C → rb.ed {x,u,v}
Note. In the figure on the next slide, where we omit the context set of various configurations (i.e. only show the marked production involved), the inference involved is that they are irrelevant to the assertions being made about the figure.
States 2 and 8 are not compatible since the first configuration of state 2 has d as context in common with the second configuration of state 8. In fact if we were to combine states 2 and 8, it would produce a combination of states 3 and 9 as its u-successor. This state would have a conflict, in that in had reduce actions, for when the next input symbol was d, for both Z → tu and V → є
Now consider the altered machine obtained if the production X → aYd where replaced by (say) X → aYa. In this case the first configuration of state 2 would be Y → t.W {a}. It would then follow that states 2 and 8 were compatible and could safely be combined to form: Y → t.W {a, e}. Z → t.u {c, d} W → .uV
The Journal paper describing this method of combining states contains a formal proof of its correctness. But seeing our’s is a practically oriented course, we will just consider an informal justification based on a few examples to supply a flavor of the reasoning involved
The main argument is that if the parsing machine containing the states S and S, as described in the defn. of compatibility, has no conflicts, and S and S are compatible, then the parsing machine obtained by combining them will also have no conflicts.
The argument is by contradiction. Let’s consider examples of the various ways that two configurations in the combination of S and S could have conflicts or lead to conflicts between other pairs of configurations in states reachable from S. In each case we hope to show that either the parsing machine as it was before S and S were combined contained conflicts in the first place or that S and S could not in fact have been compatible.
Case 1. Let configs 1 and 2 of the combined state formed from states S and S’ be: A → r B.uv {a,b} C → t B.uv {a,c} Seeing that the machine as it was before the combination contained no conflicts, and specifically did not contain a conflict in the uv successor of these states, either state S must have contained the a in its version of config1, while state S contained the a in its version of config 2, or vice-versa.
Case 1 contd. A → r B.uv {a,b} C → t B.uv {a,c} In either case neither condition (a) nor (b) of the defn. would then be true for the two configs, and since condition (c) is also not true, states S and S’ could not have been compatible in the first place.
Case 2. Let configs 1 and 2 of the combined state be: A → r B.uv {a,b} D → t B.Ca C →.uv {a} Either S or S must contain A → r B.uv {a.. }, in which case the original parsing machine would have had a conflict at its uv-successor. This is in contradiction to our assumption that the original parsing machine was conflict-free.
Case 3. Let configs 1 and 2 be: A → s B.Ea E →.uv {a} D → t B.Ca C →.uv {a} Here again the original parsing machine would have had conflicts in the uv-successors of both S and S
Case 4. Let configs 1 and 2 be: A → r B.uv D → t B.uvr Here too the original parsing machine would have had conflicts in the uv-successors of both S and S. In this case the conflict would have been between a reduction and a transition.
EXERCISE Construct an LR(1) parsing machine for the grammar on the next slide, combining compatible states as you encounter them
program → main ; statement_list end main; statement_list → statement_list statement | statement statement → assign_statement | while_statement | do_statement assign_statement → identifier = identifier while_statement → while ( condition ) statement_list wend condition → identifier = identifier do_statement → do identifier = number to number ; statement_list end do ;