1 / 24

Lexical and syntax analysis

CSci210.BA4. Lexical and syntax analysis. Chapter 4 Topics. Introduction Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing. Introduction.

annora
Download Presentation

Lexical and syntax analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSci210.BA4 Lexical and syntax analysis

  2. Chapter 4 Topics • Introduction • Lexical and Syntax Analysis • The Parsing Problem • Recursive-Descent Parsing • Bottom-Up Parsing

  3. Introduction • Syntax analyzers almost always based on a formal description of the syntax of the source language (grammars) • Almost all compilers separate analyzing syntax into: • Lexical Analysis – low-level • Syntax Analysis – high-level

  4. Reasons to Separate Syntax and Lexical Analysis • Simplicity – lexical analysis is less complex, so the process is simpler when separated • Efficiency – allows for selective optimization • Portability – lexical analyzer is somewhat platform dependent whereas the syntax analyzer is more platform independent

  5. Lexical Analysis • A pattern matcher for character strings • Performs syntax analysis at the lowest level of the program structure • Extracts lexemes from a given input string and produce the corresponding tokens

  6. Lexical Analysis (continued) result = oldsum – value / 100; TokenLexeme IDENT result ASSIGN_OP = IDENT oldsum SUB_OP - IDENT value DIV_OP / INT_LIT 100 SEMICOLON ;

  7. Building a Lexical Analyzer • Write a formal description of the tokens and use a software tool that constructs lexical analyzers when given such a description • Design a state transition diagram that describes the tokens and write a program that implements the diagram • Design a state transition diagram that describes the tokens and hand-construct a table-driven implementation of the state diagram

  8. State (Transition) Diagram Design • A directed graph with nodes labeled with state names and arcs labeled with input characters • Including states and transitions for each and every token pattern would be too large and complex • Transitions can be combined to simplify the state diagram

  9. The Parsing Problem • Two goals of syntax analysis: • Check the input program for any syntax errors, produce a diagnostic message if an error is found, and recover • Produce the parse tree, or at least a trace of the parse tree, for the program • Two Classes of parsers: • Top-down • Bottom-up

  10. Top-Down Parsers • Traces or builds a parse tree in preorder (leftmost derivation) • The most common top-down parsing algorithms: • Recursive descent • LL parsers

  11. Bottom-Up Parsers • Produce the parse tree by beginning at the leaves and progressing towards the root • Most common bottom-up parsers are in the LR family

  12. Complexity of Parsing • Parsing algorithms that work for any unambiguous grammar are complex and inefficient: O(n3) • Compilers use parsers that only work for a subset of all unambiguous grammars, but do it in linear time: O(n)

  13. Recursive-Descent Parsing • Top-Down Parser • EBNF is ideal for the basis of a recursive-descent parser • Each terminal maps to a function • For a non-terminal with more than one RHS, look at the next token to determine which side to choose • No mapping = syntax error

  14. Recursive-Descent Parsing • Grammar for an expression: <expr> → <term> {+ <term>} <term> → <factor> {* <factor>} <factor> → id | int_constant | ( <expr> ) • How do we parse? Expression: 1 + 2 <expr> → <term> + <term> → <factor> + <term> → 1 + <term>

  15. Recursive-Descent Parsing • Grammar for an expression: <expr> → <term> {+ <term>} <term> → <factor> {* <factor>} <factor> → id | int_constant | ( <expr> ) • What does code look like? void expr() { term(); while (nextToken == ADD_OP) { lex(); term(); } }

  16. Recursive-Descent Parsing • The LL (Left Recursion) Problem <expr> → <expr> + <term> <expr> → <expr> + <term> + <term> <expr> → <expr> + <term> + <term> + <term> • How do we fix it? • Modify grammar to remove left recursion Before: <expr> → <expr> + <term> After: <expr> → <term> + <term> <term> → id | int_constant | <expr>

  17. Recursive-Descent Parsing • The PairwiseDisjointness Problem • If the grammar is not pairwise disjoint, how do you know which RHS to pick based on the next token?<variable> → identifier | identifier[<expr>] • How do we fix it? • Left Factoring <variable> → identifier<new> <new> → ø | [<expr>]

  18. Bottom-Up Parsing • Parsing is based on reduction • Reverse of a rightmost derivation • At each step, find the correct RHS that reduces to the previous step in the derivation • Example Grammar <S> → <A>b Input: ab <A> → a Step 1: <A>b <A> → b Step 2: <S>

  19. Bottom-Up Parsing • Most bottom-up parsers are shift-reduce algorithms • Shift – move token onto the stack • Reduce – replace RHS with LHS

  20. Bottom-Up Parsing • Handles • Def:  is the handle of the right sentential form iff  = w if and only if S =>*rmAw =>rmw • The handle of a right sentential form is its leftmost simple phrase • Bottom-Up Parsing is essentially looking for handles and replacing them with their LHS

  21. Bottom-Up Parsing • Advantages of Shift Reduction Parsers • They can be built for all programming languages • They can detect syntax errors as soon as it is possible in a left-to-right scan • They LR class of grammars is a proper superset of the class parsable by LL parsers (for example, many left recursive grammars are LR, but none are LL)

  22. Bottom-Up Parsing • Shift Reduction Algorithms • Input Sequence – input to be parsed • Parse Stack – input is shifted onto the parse stack • ACTION Table – what the parser does • GOTO Table – holds state symbols to be pushed onto the stack when a reduction is completed

  23. Bottom-Up Parsing • ACTION Table (or Parse Table) • Rows = State Symbols • Columns = Terminal symbols • Values • Shift – push token on stack • Reduce – replace handle with LHS • Accept – stack only has start symbol and input is empty • Error – original input is invalid

  24. Bottom-Up Parsing • GOTO Table (or Parse Table) • Rows = State Symbols • Columns = Nonterminal Symbols • Values indicate which state symbol should be pushed onto the parse stack after a reduction has been completed

More Related