200 likes | 402 Views
Design Patterns for Recursive Descent Parsing. Dung Nguyen, Mathias Ricken & Stephen Wong Rice University. RDP in CS2?. Context: objects-first intro curriculum which already covers Polymorphism Recursion Design patterns (visitors, factories, etc) OOD principles Want good OOP/D example
E N D
Design Patterns for Recursive Descent Parsing Dung Nguyen, Mathias Ricken & Stephen Wong Rice University
RDP in CS2? • Context: objects-first intro curriculum which already covers • Polymorphism • Recursion • Design patterns (visitors, factories, etc) • OOD principles • Want good OOP/D example • Want a relevant CS topic • Recursive Descent Parsing: • Smooth transitions from simple to complex examples, developing abstract model • ∆ change in grammar ∆ change in code
? ? Parser generator The Problem of Teaching RDP Mutual Recursion! “A complex, isolated, advanced topic for upper division only” Global Analysis ? ? New Grammar New Code
Object-Oriented Approach • Grammar must drive any processing related to it, e.g. parsing. • Model the grammar first: • Terminal symbols (tokens) • Non-Terminal symbols (incl. start symbol) • Rules • Driving forces • Decouple intelligent tokens from rules visitors to tokens • Extensible system: open ended number of tokens extended visitors Then Parsing will come!
Representing Tokens • Intelligent Tokens No type checking! • Decoupled from processing Visitor pattern • For LL(1) grammars, in any given situation, the token determines the parsing action taken • Parsing is done by visitors to tokens
Processing Tokens with Visitors Standard Visitor Pattern: Visitor caseA caseB visits Token A calls visits calls Token B But we want to be able to add an unbounded number of tokens!
VisitorB caseB Processing Tokens with Visitors Visitor Pattern modified with Chain-of-Responsibility: Visitor caseA VisitorA defaultCase visits Token A caseA calls delegates to visits chain calls Token B visits VisitorB defaultCase caseB caseB calls Handles Any Types of Tokens!
Modeling an LL(1) Grammar • Left-Factoring • Make grammar predictively parsable E | F F + E ¤ F num | id | E1 E1 empty
Modeling an LL(1) Grammar • In multiple rules (branches), replace sequences and tokens with unique non-terminal symbols • Branches only contain non-terminals E E1 F | E1 empty + E E1a | F num id E1a | F num id F1 F1 F2 F2
Modeling an LL(1) Grammar • Branches modeled by inheritance (“is-a”) A B | C • Sequences modeled by composition (“has-a”) S X Y
Object Model of Grammar E F E1 E1 empty | E1a E1a + E F F1 | F2 F1 num F2 id Grammar Structure=Class Structure
Modeling an LL(1) Grammar No Predictive Parsing Table! Declarative, not procedural Model the grammar, not the parsing!
Abstract and Local Analysis! Detailed and Global Analysis E E1 F To process E, we must have the ability to process F and E1, independent of how either F or E1 are processed! To process E, we must firstknow about F and E1… | E1 empty E1a E1a + E But to process F, we must first know about F1 and F2… E1a | F F1 F2 Since parsing is done with visitors to tokens, all we need to parse E are the visitors to parse F and E1. F1 F1 num but to process F1, we must firstknow about num! F2 id But E doesn’t know what it takes to make the F and E1 parsing visitors… The processing of one rule requires deep knowledge of the whole grammar! We need abstract construction of the visitors… Or does it??... Abstract Factories Decouple Rules
Factory Model of Parser E F E1 E1 empty | E1a E1a + E F F1 | F2 F1 num F2 id Parser Structure=Factory Structure Grammar represented purely with composition
Extending the Grammar • Adding new tokens and rules • Highly localized impact on code • No re-computing of prediction tables
E S E1E1 empty | E1aE1a + ES P | TP (E)T F T1T1 empty | T1aT1a * SF F1 | F2F1 numF2 id E F E1E1 empty | E1aE1a + EF F1 | F2F1 numF2 id
Parser Demo (If time permits) We change your grammar in two minutes while you wait! gram
Automatic Parser Generator • No additional theory needed for generalization • No fixed-points, FIRST and FOLLOWS sets • Kooprey • Parser generator: BNF Java • kou·prey (noun): “a rare short-haired ox (Bos sauveli) of forests of Indochina […]” (Merriam-Webster Online) • Extensions • Skip generation of source, create parser at runtime
Conclusion • Simple enough to introduce in CS2 course (@Rice – near end of CS2) • Teaches an abstraction of grammars and parsing • Reinforces foundational OO principles • Abstract representations • Abstract construction • Decoupled systems • Recursion http:///www.exciton.cs.rice.edu/research/sigcse05