1 / 19

Design Patterns for Recursive Descent Parsing

Design Patterns for Recursive Descent Parsing. Dung Nguyen, Mathias Ricken & Stephen Wong Rice University. RDP in CS2?. Context: objects-first intro curriculum which already covers Polymorphism Recursion Design patterns (visitors, factories, etc) OOD principles Want good OOP/D example

goldy
Download Presentation

Design Patterns for Recursive Descent Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design Patterns for Recursive Descent Parsing Dung Nguyen, Mathias Ricken & Stephen Wong Rice University

  2. RDP in CS2? • Context: objects-first intro curriculum which already covers • Polymorphism • Recursion • Design patterns (visitors, factories, etc) • OOD principles • Want good OOP/D example • Want a relevant CS topic • Recursive Descent Parsing: • Smooth transitions from simple to complex examples, developing abstract model • ∆ change in grammar  ∆ change in code

  3. ? ? Parser generator The Problem of Teaching RDP Mutual Recursion! “A complex, isolated, advanced topic for upper division only” Global Analysis ? ? New Grammar New Code

  4. Object-Oriented Approach • Grammar must drive any processing related to it, e.g. parsing. •  Model the grammar first: • Terminal symbols (tokens) • Non-Terminal symbols (incl. start symbol) • Rules • Driving forces • Decouple intelligent tokens from rules visitors to tokens • Extensible system: open ended number of tokens  extended visitors Then Parsing will come!

  5. Representing Tokens • Intelligent Tokens  No type checking! • Decoupled from processing  Visitor pattern • For LL(1) grammars, in any given situation, the token determines the parsing action taken •  Parsing is done by visitors to tokens

  6. Processing Tokens with Visitors Standard Visitor Pattern: Visitor caseA caseB visits Token A calls visits calls Token B But we want to be able to add an unbounded number of tokens!

  7. VisitorB caseB Processing Tokens with Visitors Visitor Pattern modified with Chain-of-Responsibility: Visitor caseA VisitorA defaultCase visits Token A caseA calls delegates to visits chain calls Token B visits VisitorB defaultCase caseB caseB calls Handles Any Types of Tokens!

  8. Modeling an LL(1) Grammar • Left-Factoring • Make grammar predictively parsable E  | F F + E ¤ F  num | id | E1  E1 empty

  9. Modeling an LL(1) Grammar • In multiple rules (branches), replace sequences and tokens with unique non-terminal symbols • Branches only contain non-terminals E  E1 F | E1  empty + E E1a  | F  num id E1a | F  num id F1 F1  F2  F2

  10. Modeling an LL(1) Grammar • Branches modeled by inheritance (“is-a”) A  B | C • Sequences modeled by composition (“has-a”) S  X Y

  11. Object Model of Grammar E  F E1 E1 empty | E1a E1a  + E F  F1 | F2 F1  num F2  id Grammar Structure=Class Structure

  12. Modeling an LL(1) Grammar No Predictive Parsing Table! Declarative, not procedural Model the grammar, not the parsing!

  13. Abstract and Local Analysis! Detailed and Global Analysis E E1 F To process E, we must have the ability to process F and E1, independent of how either F or E1 are processed! To process E, we must firstknow about F and E1… | E1  empty E1a E1a  + E But to process F, we must first know about F1 and F2… E1a | F  F1 F2 Since parsing is done with visitors to tokens, all we need to parse E are the visitors to parse F and E1. F1 F1  num but to process F1, we must firstknow about num! F2  id But E doesn’t know what it takes to make the F and E1 parsing visitors… The processing of one rule requires deep knowledge of the whole grammar! We need abstract construction of the visitors… Or does it??... Abstract Factories Decouple Rules

  14. Factory Model of Parser E  F E1 E1 empty | E1a E1a  + E F  F1 | F2 F1  num F2  id Parser Structure=Factory Structure Grammar represented purely with composition

  15. Extending the Grammar • Adding new tokens and rules • Highly localized impact on code • No re-computing of prediction tables

  16. E S E1E1 empty | E1aE1a  + ES  P | TP  (E)T  F T1T1  empty | T1aT1a  * SF  F1 | F2F1  numF2  id E  F E1E1 empty | E1aE1a  + EF  F1 | F2F1  numF2  id

  17. Parser Demo (If time permits) We change your grammar in two minutes while you wait! gram

  18. Automatic Parser Generator • No additional theory needed for generalization • No fixed-points, FIRST and FOLLOWS sets • Kooprey • Parser generator: BNF  Java • kou·prey (noun): “a rare short-haired ox (Bos sauveli) of forests of Indochina […]” (Merriam-Webster Online) • Extensions • Skip generation of source, create parser at runtime

  19. Conclusion • Simple enough to introduce in CS2 course (@Rice – near end of CS2) • Teaches an abstraction of grammars and parsing • Reinforces foundational OO principles • Abstract representations • Abstract construction • Decoupled systems • Recursion http:///www.exciton.cs.rice.edu/research/sigcse05

More Related