170 likes | 316 Views
Increasing power of LL(k) parsers. Bc. Jozef Lang (xlangj01) Bc. Zolt án Zemko (xzemko01). Outline. LL(1) parsers Why increasing the power of LL(k) parsers? LL(k) parsers Linear approximate LL(k) parsing LL-regular parsing Parse tree grammars Extended LL(1) grammars Conclusion.
E N D
Increasing power of LL(k) parsers Bc. Jozef Lang (xlangj01) Bc. Zoltán Zemko (xzemko01)
Outline • LL(1) parsers • Why increasing the power of LL(k) parsers? • LL(k) parsers • Linear approximate LL(k) parsing • LL-regular parsing • Parse tree grammars • Extended LL(1) grammars • Conclusion
LL(1) parsing • Deterministic top-down parsing • Prediction is made only of one symbol, thus LL(1) is an 1 look-ahead parsing • The starting terminal symbol of every non-terminal symbol is needed when a parse table is constructed • FIRSTk set -- set of terminals that are at the first k positions of strings that non-terminal can be derived to • FOLLOW set – set of all terminal symbols that can follow non-terminal symbol in any sequential form derivable from S#
LL(1) versus strong-LL(1) • If all entries of the parse table have at most one element then the grammar is called strong-LL(1) • Every LL(1) grammar is also a strong-LL(1) grammar • When a parse table entry contains more than one entry then it is a LL(1) conflict • A parser with LL(1) conflict is not deterministic and thus is less efficient
LL(1) versus strong-LL(1) (2) • LL(1) conflicts can be solved by • Left recursion elimination • Left factoring • Conflict resolvers
Why increasing the power of LL(k) parsers? • Let’s have a following grammar where idf produces identifiers: • This fragment defines expression elements like x, sin(0.41), T[2,3] • First token is common for all expressions, but the second token distinguish between alternatives • Look ahead of only one token is not enough • It would be handy to increase power of deterministic LL parsing.
LL(k) parsers • It is sometimes handy to look ahead of k symbols, where k > 1 • Need to define FIRSTksets • Let’s have sequential form x. FIRSTk(x) is a set of terminals where: where y is some sequential form
LL(k) parsers (2) • Assume that we have following rule in a grammar G • Grammar G is a LL(k) grammar iff the sets FIRSTk(a1x#k) … FIRSTk(akx#k) are pairwise disjoint. • Symbol #k represents the number of look-ahead symbols • It is obvious that every LL(k) grammar is a subset of LL(k+1) grammars, this does not hold vice versa.
LL(k) parsers (3) • Similarly as by LL(1) parsers, producing parse tables for LL(k) parsers is difficult • FOLLOW set for a LL(k) grammar is defined as an union of FIRSTk(x#k) for any prediction Ax#k • As by LL(1) parsers, the parse table will be indexed by with a pair consisting of a non-terminal symbol and string of terminals with the length equal to k • If a parse table has for every entry at most one element then the grammar denoted by this parse table is strong-LL(k) • For k > 1 there are grammars that are LL(k) but not strong-LL(k)
LL(k) parsers (4) • Strong-LL(k) parsers are only seldom used in practice • Similar effect can be obtained by using conflict resolvers
Linear-approximate LL(k) parsing • Difficult constructing of LL(k) parse tables can be avoided by a simple trick • In addition to FIRST set, introduce SECOND, THIRD etc set • The size complexity is reduced from O(tk) to k tables of O(t), where k is the number of sets • Linear-approximate LL(k) grammar is weaker than LL(k) grammar because it breaks the relationship between tokens • Let’s assume that we have LL(2) grammar that has look ahead sets of { ab, cd }{ad, cb } • Linear-approximate LL(2) grammar has FIRST set { ac } and SECOND {bd} – there are not disjoint
LL -regular parsing • LL(k) provides bounded look-ahead • There are grammars where a discriminating token can be arbitrarily far away • Unbounded look-ahead is needed • Unbounded look-ahead forms its own context-free grammar • Context-free grammar can be approximated by regular grammar • There is no algorithm to approximate context-free grammar, but there are several heuristics
Parse tree grammar from LL(1) • A straightforward process • Basic idea is to create new rule for every prediction • The non-terminals are numbered by an increasing global counter • Then are inserted into prediction stack • New created rules forms parse tree grammar • As far as the parser is deterministic, the parse tree grammar is obtained instead of parse forest grammar
Extended LL(1) grammars • Some parsers accept Extended LL(1) grammars instead of ordinary one • To accept Extended LL(1) grammar parser must transform it to ordinary one without introducing LL(1) conflicts • An advantage of Extended LL(1) grammars is that they allow a more efficient implementation in recursive descent parsers
Conclusion • LL(1) is very intuitive, makes its steps according to prediction of one token • There are situations where look-ahead only of one symbol is not sufficient • The power of LL parsers can be improved by extending the bounding look-ahead to • a bounded length resulting in LL(k) parsing • a unbounded length resulting in LL – regular parsing • Linear-approximate LL(2) parsing is a convenient and simplified form of a LL(2) parsing