Chapter 3 Context-Free Grammars and Parsing

Chapter 3 Context-Free Grammars and Parsing Gang S. Liu College of Computer Science & Technology Harbin Engineering University Samuel2005@126.com

Introduction • Parsing is the task of determining the syntax, or structure, of a program. • It is also called syntax analysis. • The syntax of a programming language is usually given by the grammar rules of a context-free grammar. • The rules of context-free grammar are recursive. • Data structures representing the syntactic structure are also recursive – a parse tree or syntax tree. Samuel2005@126.com

The Parsing Process • Usually, the sequence of tokens is not an explicit input parameter, but the parser calls a scanner procedure such as getToken to fetch the next token from the input as it is needed during the parser process. parser Sequence of tokens Syntax tree Samuel2005@126.com

Context-Free Grammars • A context-free grammar is a specification for the syntactic structure of a programming language. • Context-free grammar involves recursive rules. • Example: • integer arithmetic expressions with additions, subtraction, and multiplication operations • exp → exp op exp | (exp) | number • op → + | - | * Samuel2005@126.com

BNF • exp → exp op exp | (exp) | number • op → + | - | * • Names are written in italic. • |- metasymbol for choice. • Concatenation is used as a standard operation. • No repetitions. • → is used to express the definitions of names. • Regular expressions are used as components. • The notation was developed by John Backus and adapted by Peter Naur. • The grammar rules in this form are said to be in Backus-Naur Form, or BNF. Samuel2005@126.com

Context-Free Grammar Rules • Grammar rules are defined over an alphabet, or set of symbols. • The symbols are usually tokens representing strings of characters. • Context-free grammar rule consists of a string of symbols • Name for a structure. • Metasymbol →. • A string of symbols • Either a symbol from the alphabet • Or a name for a structure • Or metasymbol | • exp → exp op exp | (exp) | number • op → + | - | * Samuel2005@126.com

Context-Free Grammar Rules (cont) • The rule defines the structure whose name is to the left of the arrow. • The structure is defined to consist of one of the choices on the right-hand side separated by the vertical bars. exp → exp op exp | (exp) | number op → + | - | * Samuel2005@126.com

Legal String? • (34 – 3) * 42corresponds to the legal string of tokens (number – number) * number • (34 – 3 * 2 is not legal expression exp → exp op exp | (exp) | number op → + | - | * Samuel2005@126.com

Derivations • Grammar rules determine the legal strings of tokens by means of derivations. • Derivation is a sequence of replacements of structure names by choices on the right-hand sides of grammar rules. • Derivation begins with a single structure name and ends with a string of token symbols. exp → exp op exp | (exp) | number op → + | - | * Samuel2005@126.com

Example of Derivation exp → exp op exp | (exp) | number op → + | - | * (34 –3) * 42 • exp=> exp op exp • => exp op number • => exp * number • => (exp)* number • =>(exp op exp)* number • => (exp op number)* number • => (exp –number)* number • => (number–number) * number • => (34–3) * 42 Grammar rules define → Derivation steps construct by replacement => Samuel2005@126.com

Terminology • The set of all strings of token symbols obtained by derivations is the language defined by the grammar. • Grammar rules are called productions, they produce the legal strings of the language via derivations. • The first rule is called the start symbol. • Structure names are called nonterminals. • They are to be replaced, do not terminate the derivation. • Symbols in the alphabet are called terminals. • They terminate a derivation. Samuel2005@126.com

Example 3.1 • Let G be a grammar defined by the rule E → ( E ) | a • This grammar has one nonterminal E and three terminals ( , ) , and a. • This grammar generates language: • L(G) = { a, (a), ((a)), (((a))), …} • Derivation for ((a)) • E => (E) • => ((E)) • => ((a)) Samuel2005@126.com

Example 2.3 • Σ = {a, b} • Consider a set of strings consisting of a single b surrounded by the same number of a’s. S = {b, aba, aabaa, aaabaaa, …} • Regular expressiondoes not work. • This set of strings cannot be described by a regular expression. • This can be proved using a famous theorem called the pumping lemma. a*ba* Samuel2005@126.com

Chapter 3 Context-Free Grammars and Parsing