Syntax Analysis: Introduction, Context-Free Grammars, and Parsing Techniques

Chapter 4 Syntax Analysis

Content • Overview of this chapter • 4.1 Introduction • 4.2 Context-Free Grammars • 4.3 Writing a Grammar • 4.4 Top-Down Parsing • 4.5 Bottom-Up Parsing • 4.6 Introduction to LR Parsing: Simple LR • 4.7 More Powerful LR Parsers • 4.8 Using Ambiguous Grammars • 4.9 Parser Generators

4.1 Introduction

4.1 Introduction In this section, we • Examine the way the parser fits into a typical compiler • Look at typical grammars for arithmetic expressions • Discuss error handling

4.1.1 The Role of the Parser • Position of parser in compiler model • Types of parsers • Universal • Top-down • Bottom-up

4.1.2 Representative Grammars • Grammar 4.1(LR, suitable for bottom-up parsing) • Grammar 4.2(Non-left-recursive, used for top-down) • Grammar 4.3(Handling ambiguities)

4.1.3 Syntax Error Handling • Common programming errors: • Lexical errors • Syntactic errors • Semantic errors • Logical errors • Parsing methods allows syntactic errors to be detected • Goals of error handler in a parser: • Report the presence of errors • Recover from each error • Add minimal overhead

4.1.4 Error-Recovery Strategies • Panic-Mode Recovery • Phrase-Level Recovery • Error Productions • Global Correction

4.2 Context-Free Grammars

4.2 Context-Free Grammars In this section, we • Review the definition of a context-free grammar • Introduce terminology for talking about parsing

4.2.1 The Formal Definition of a Context-Free Grammar • A context-free grammar consists of: 1. terminals: Basic symbols from which strings are formed 2. nonterminals:Syntactic variables that denote sets of strings 3. start symbol: One nonterminal 4.productions: (specify the manner) 1) A nonterminal called the head or left side 2) The symbol  3) A body or right sideconsisting of zero or more terminals and nonterminals

4.2.1 The Formal Definition of a Context-Free Grammar • Example: terminals: id, +, - ,*, /, (, ) nonterminals: expression, term, factor start symbol: expression

4.2.2 Notational Conventions • terminals: 1. Lowercase letters: a, b, c… 2. Operator symbols: +, *, … 3. Punctuation symbols: parentheses, comma… 4. Digits: 0,1,2… 5. Boldface strings: such as id ,if • nonterminals: 1. Uppercase letters: A, B, C… 2. The letter S 3. Lowercase, italic names such as expr or stmt 4. When discussing programming constructs:E,T,F

4.2.2 Notational Conventions • Uppercase letters late in the alphabet: such as X, Y, Z, representgrammar symbols (either nonterminals or terminals.) • Lowercase letters late in the alphabet, chiefly u, v, . . . , x, represent strings of terminals • Lowercase Greek letters: α,β,γ, represent strings ofgrammar symbols • A set of productions A -> α1, A -> α2, … ,or A-> α1| α2|…, Call α1, α2,. . . the alternatives for A • The head of the first production is the start symbol

4.2.3 Derivations Consider grammar: : “E derives -E“ A->γ is a production, then : derives in one step : derives in zero or more steps : derives in one or more steps • leftmost andrightmost derivations

4.2.4 Parse Trees and Derivations • Parse tree for – ( id + id ) • BASIS:The tree for a1=A is a single node labeled A • INDUCTION: i-1 = XI X2 …. Xk (Xi is either a nonterminal or a terminal). Suppose i is • derived from i-1 by replacing Xj, a nonterminal, by , = Y1Y2 …..Ym. That is, at the ith step of the derivation, production Xj is applied to i-1 to • derive i = XIX2 ….Xj-1 Xj+l …Xk

4.2.4 Parse Trees and Derivations • Example: Sequence of parse trees for derivation (4.8)

4.2.5 Ambiguity • ambiguous: A grammar that produces more than one parse tree for some sentence • Example:(4.3) grammar (4.3) permits two distinct leftmost derivations for the sentence id+id*id

4.2.5 Ambiguity two parse trees for id+id*id

4.2.6 Verifying the Language Generated by a Grammar • A grammar G generates a language L has two parts: 1. Every string generated by G is in L 2. Every string in L can be generated by G • Example:S->(S)S|Є 1. Every sentence derivable from S is balanced 2. Every balanced string is derivable from S

4.2.7 Context-Free Grammars Versus RegularExpressions • Grammars are more powerful than regular expressions 1. Every construct that can be described by a regular expression can be described by a grammar, but not vice-versa 2. Every regular language is a context-free language, but not vice-versa • Example 1:(alb)*abb can be described by grammar:

4.2.7 Context-Free Grammars Versus RegularExpressions • Example 2: Consider the language with an equal number of a's and b's Can be described by a grammar but not regular expression

4.3 Writing a Grammar

4.3 Writing a Grammar In this section, we • Discuss how to divide work between lexical analyzer and a parser • Consider several transformations • One technique that can eliminate ambiguity • Left-recursion elimination and left factoring • Consider some programming language constructs that cannot be described by any grammar

4.3.1 Lexical Versus Syntactic Analysis • Why use regular expressions? • Provides a convenient way of modularizing the front end of a compiler into two manageable-sized components • Quite simple • Provide a more concise and easier-to-understand notation for tokens • More efficient lexical analyzers can be constructed automatically • Regular expressions:identifiers, constants, keywords, whitespace (structure of constructs) • Grammars:balanced parentheses, matching begin-end's, corresponding if-then-else's (nested structures)

4.3.2 Eliminating Ambiguity • "dangling-else“ grammar is ambiguous since string: “if El then if E2 then S1 else S2” has the two parse trees:

4.3.2 Eliminating Ambiguity • Rewrite “dangling-else” grammar: unambiguous

4.3.3 Elimination of Left Recursion • What is left recursive? A grammar is left recursive if it has a nonterminal A such that there is a derivation A Aα for some string α e.g. A ->Aα|β • Why eliminating left recursion? Top-down parsing methods cannot handle left-recursive grammars • Technique of eliminating left recursion: • Group the productions as:

4.3.3 Elimination of Left Recursion 2. Replace the A-productions by: • Example:

4.3.4 Left Factoring • left factoring: A grammar transformation that is useful for producing a grammar suitable for predictive, or top-down, parsing e.g. stmt ->if expr then stmt else stmt | ifexpr then stmt • Left factoring a grammar: 1. Find the longest prefix α of A, α is common to two or more of its alternatives

4.3.4 Left Factoring 2. Replace by A’ is a new nonterminal 3. Repeatedly apply this transformation • Example:

第二次作业 • 3.9.4 (2) • 4.2.1 • 4.3.1

The end of Lecture04

Syntax Analysis: Introduction, Context-Free Grammars, and Parsing Techniques

Syntax Analysis: Introduction, Context-Free Grammars, and Parsing Techniques

Presentation Transcript

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4-4

Chapter 4

Chapter 4

Chapter 4 - 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

Chapter 4