220 likes | 236 Views
Learn about the differences between regular expressions (regex) and context-free grammar (CFG) in programming language syntax. Understand how to specify token patterns and patterns of tokens using CFG, and how to build unambiguous expressions with associativity and precedence.
E N D
Programming Language Syntax 2 http://flic.kr/p/zCyMp
Think-Pair-Share Activity Assuming the following INTEGER regex: Try to build a regex that matches arithmetic expressions, such as: • 55 • 2 + 5 • 4 * 5 / -3 • (9 * 4)/(2 - +4) • ((4 + 7) * 10) * (69 + 7) / (44 - (22 + 66) * +5)
Here’s one way • Except this isn’t a regex • Regexes cannot have recursive constructs • It’s actually a context-free grammar (CFG) • Like regexes with recursion • Expressed (more or less) in Backus-Naur Form (BNF)
Recall from last time… ANTLR generates for you … But how do you tell ANTLR what your language is like?
You specify token patterns usingRegular Expressions … You specify patterns of tokens using aContext-Free Grammar
Important distinctions betweenregexes and CFG rules in ANTLR • Naming: • Regex names start with uppercase letter • CFG-rule names start with lowercase letter • Character versus token handling: • Regexes process stream of characters • CFG rules process stream of tokens
Will this regex match these strings? “4 4 4 4”? “4444”? No Yes
Will this CFG production match these strings? “4 4 4 4”? “4444”? Yes Yes
Backus-Naur Form (BNF) ANTLR BNF BNF BNF
Extended BNF (EBNF) ANTLR BNF Things not in BNF EBNF
CFG Terminology terminals non-terminal productions
CFG Derivation Series of replacement operations that shows how to derive a string of terminals from the start symbol
Derivation Example CFG: String to derive:
CFG: Derivation: String to derive:
Parse Tree: Graphical Representation of Derivation Can you think of another possible derivation? Hint: This one is a “right-most” derivation
Here’s a “left-most” derviation A grammar with multiple possible derivations is ambiguous Makes generating parser more difficult
Two concepts important to expressions • Associativity: Group based on L-to-R order • 10 - 4 - 3 means (10 - 4) - 3 versus 10 - (4 - 3) • Precedence: Group based on operator • 3 + 4 * 5 means 3 + (4 * 5) versus (3 + 4) * 5
Think-Pair-Share Activity • Rewrite this CFG to be unambiguous • Left associative • Multiplication/division have higher precedence than addition/subtraction
Solution • Create parse tree for: • 3 + 4 * 5 • 10 - 4 - 3
What’s next? • Homework 1 assigned!