1 / 22

Programming Language Syntax: Regex vs CFG

Learn about the differences between regular expressions (regex) and context-free grammar (CFG) in programming language syntax. Understand how to specify token patterns and patterns of tokens using CFG, and how to build unambiguous expressions with associativity and precedence.

danielsm
Download Presentation

Programming Language Syntax: Regex vs CFG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming Language Syntax 2 http://flic.kr/p/zCyMp

  2. Think-Pair-Share Activity Assuming the following INTEGER regex: Try to build a regex that matches arithmetic expressions, such as: • 55 • 2 + 5 • 4 * 5 / -3 • (9 * 4)/(2 - +4) • ((4 + 7) * 10) * (69 + 7) / (44 - (22 + 66) * +5)

  3. Here’s one way • Except this isn’t a regex • Regexes cannot have recursive constructs • It’s actually a context-free grammar (CFG) • Like regexes with recursion • Expressed (more or less) in Backus-Naur Form (BNF)

  4. Recall from last time… ANTLR generates for you … But how do you tell ANTLR what your language is like?

  5. You specify token patterns usingRegular Expressions … You specify patterns of tokens using aContext-Free Grammar

  6. Important distinctions betweenregexes and CFG rules in ANTLR • Naming: • Regex names start with uppercase letter • CFG-rule names start with lowercase letter • Character versus token handling: • Regexes process stream of characters • CFG rules process stream of tokens

  7. Will this regex match these strings? “4 4 4 4”? “4444”? No Yes

  8. Will this CFG production match these strings? “4 4 4 4”? “4444”? Yes Yes

  9. Backus-Naur Form (BNF) ANTLR BNF BNF BNF

  10. Extended BNF (EBNF) ANTLR BNF Things not in BNF EBNF

  11. CFG Terminology terminals non-terminal productions

  12. CFG Derivation Series of replacement operations that shows how to derive a string of terminals from the start symbol

  13. Derivation Example CFG: String to derive:

  14. CFG: Derivation: String to derive:

  15. Parse Tree: Graphical Representation of Derivation Can you think of another possible derivation? Hint: This one is a “right-most” derivation

  16. Here’s a “left-most” derviation A grammar with multiple possible derivations is ambiguous Makes generating parser more difficult

  17. Two concepts important to expressions • Associativity: Group based on L-to-R order • 10 - 4 - 3 means (10 - 4) - 3 versus 10 - (4 - 3) • Precedence: Group based on operator • 3 + 4 * 5 means 3 + (4 * 5) versus (3 + 4) * 5

  18. Think-Pair-Share Activity • Rewrite this CFG to be unambiguous • Left associative • Multiplication/division have higher precedence than addition/subtraction

  19. Solution • Create parse tree for: • 3 + 4 * 5 • 10 - 4 - 3

  20. Parse Tree for 3 + 4 * 5

  21. Parse Tree for 10 - 4 - 3

  22. What’s next? • Homework 1 assigned!

More Related