1 / 91

Understanding Compilers: Translation, Syntax, and Optimization

Explore the fundamentals of compilers, translation processes, context-free grammars, lexical analysis, and tokenization. Learn about the importance of intermediate representation and syntax trees in software construction.

garlick
Download Presentation

Understanding Compilers: Translation, Syntax, and Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Module 2 Compiler and their Working Software Construction Lecture 10 ,11 and 12

  2. What are Compilers Translate information from one representation to another Usually information = program Typical Compilers: VC, VC++, GCC, JavaC FORTRAN, Pascal, VB Translators Word to PDF PDF to Postscript 2

  3. Source Code Optimized for human readability Matches human notions of grammar Uses named constructs such as variables and procedures 3

  4. How to Translate Translation is a complex process source language and generated code are very different Need to structure the translation 4

  5. Two-pass Compiler FrontEnd BackEnd IR sourcecode machinecode errors Use an intermediate representation (IR) Front end maps legal source code into IR Back end maps IR into target machine code 5

  6. The Front-End Modules Scanner (also called Lexical analyzer) Parser sourcecode tokens IR scanner parser errors 6

  7. Scanner Maps character stream into words – basic unit of syntax Produces pairs – a word and its part of speech IR sourcecode tokens scanner parser errors 7

  8. Scanner Example x = x + ybecomes<id,x> <assign,=> <id,x> <op,+> <id,y> we call the pair “<token type, word>” a “token” typical tokens: number, identifier, +, -, new, while, if <id,x> word token type 8

  9. Parser sourcecode tokens IR scanner parser errors • Recognizes context-free syntax and reports errors • Guides context-sensitive (“semantic”) analysis • Builds IR for source program 9

  10. What is Context Free Syntax To understand this we should have base of context free grammar It is a set of write and rules such as 10

  11. Context-Free Grammars Context-free syntax is specified with a grammar G=(S,N,T,P) Sis the start symbol Nis a set of non-terminal symbols Tis set of terminal symbols or words Pis a set of productions or rewrite rules 11

  12. Context-Free Grammars Grammar for expressions 1. goal→expr 2. expr→expr op term 3. | term 4. term→number 5. | id 6. op →+ 7. | - 12

  13. The Front End For this CFG S = goal T = { number, id, +, -} N = { goal, expr, term, op} P = { 1, 2, 3, 4, 5, 6, 7} 13

  14. Context-Free Grammars Given a CFG, we can derivesentences by repeated substitution Consider the sentence (expression)x + 2 – y 14

  15. Derivation 15

  16. The Front End To recognize a valid sentence in some CFG, we reverse this process and build up a parse A parse can be represented by a tree: parse tree or syntax tree 16

  17. Parse 17

  18. Syntax Tree x+2-y goal expr expr term op – expr op term <id,y> + <number, 2> term <id,x> 18

  19. Abstract Syntax Trees The parse tree contains a lot of unneeded information. Compilers often use an abstract syntax tree (AST). 19

  20. Abstract Syntax Trees This is much more concise AST summarizes grammatical structure without the details of derivation ASTs are one kind of intermediate representation (IR) – <id,y> + <id,x> <number,2> 20

  21. Three-pass Compiler Intermediate stage for code improvement or optimization Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code May also improve space usage, power consumption, ... Must preserve “meaning” of the code. IR IR Front End Middle End Back End machine code Source code errors 21

  22. Lexical Analysis Scanner tokens sourcecode IR scanner parser errors

  23. Lexical Analysis The task of the scanner is to take a program written in some programming language as a stream of characters and break it into a stream of tokens. This activity is called lexical analysis. The lexical analyzer partition input string into substrings, called words, and classifies them according to their role Output of lexical analysis is a stream of tokens 23

  24. Tokens Example: if( i == j ) z = 0; else z = 1; Input is just a sequence of characters : 24

  25. Tokens Goal: partition input string into substrings classify them according to their role A token is a syntactic category Natural language: “He wrote the program” Words: “He”, “wrote”, “the”, “program” Programming language: “if(b == 0) a = b” Words: “if”, “(”, “b”, “==”, “0”, “)”, “a”, “=”, “b” 25

  26. Tokens Identifiers: x y11 maxsize Keywords: if else while for Integers: 2 1000 -44 5L Floats: 2.0 0.0034 1e5 Symbols: ( ) + * / { } < > == Strings: “enter x” “error” 26

  27. How to Describe Tokens? • Regular Languages are the most popular for specifying tokens • Simple and useful theory • Easy to understand • Efficient implementations

  28. Example of Languages Alphabet = English characters Language = English sentences Alphabet = ASCII Language = C++ programs, Java, C#

  29. Recap Tokens:strings of characters representing lexical units of programs such as identifiers, numbers, operators. Regular Expressions:concise description of tokens. A regular expression describes a set of strings. Language L(R):set of strings represented by a regular expression R. L(R) is the language denoted by regular expression R.

  30. Regular Expression R|S = either R or S RS = R followed by S (concatenation) R* = concatenation of R zero or more times (R*= e |R|RR|RRR...) R? = e|R (zero or one R) R+ = RR* (one or more R) [abc] = a|b|c (any of listed) [a-z] = a|b|....|z (range) [^ab] = c|d|... (anything but ‘a’‘b’)

  31. How to Use REs • We need mechanism to determine if an input string w belongs to L(R), the language denoted by regular expression R.

  32. Acceptor • Such a mechanism is called an acceptor. input string w yes, ifweL acceptor no, ifweL language L

  33. Finite Automata (FA) • Specification: Regular Expressions • Implementation: Finite Automata A finite automaton accepts a string if we can followtransitions labelled with characters in the string from start state to some accepting state

  34. SYNTACTIC VS SEMANTIC ANALYSIS

  35. Syntactic Analysis • Natural language analogy: consider the sentence He wrote program the He wrote the program noun verb article noun subject predicate object sentence

  36. Syntactic Analysis • Programming language if ( b <= 0 ) a = b assignment bool expr if-statement

  37. Syntactic Analysis int* foo(int i, int j)) { for(k=0; i j; ) fi( i > j ) return j; } extra parenthesis Missing expression not a keyword

  38. Semantic Analysis • Grammatically correct He wrote the computer noun verb article noun subject predicate object sentence

  39. Semantic Analysis int* foo(int i, int j) { for(k=0; i < j; j++ ) if( i < j-2 ) sum = sum+i return sum; } undeclared var return type mismatch

  40. Role of the Parser • Not all sequences of tokens are program. • Parser must distinguish between valid and invalid sequences of tokens. What we need An expressive way to describe the syntax An acceptor mechanism that determines if input token stream satisfies the syntax Parsing is the process of discovering a derivation for some sentence Mathematical model of syntax – a grammar G. Algortihm for testing membership in L(G).

  41. Backus-Naur Form (BNF) • Context-free grammars are (often) given by BNF expressions (Backus-Naur Form) • Grammar rules in a similar form were first used in the description of the Algol60 Language. • The notation was developed by John Backus and adapted by Peter Naur for the Algol60 report. • Thus the term Backus-Naur Form (BNF) . • The meta-symbols of BNF are: definition or description • ::= • meaning "is defined as" • | • meaning "or" • < > • angle brackets used to surround category names. • optional items are enclosed in meta symbols [ and ]

  42. Meta-symbols of BNF • optional items are enclosed in meta symbols [ and ] • example: <if_statement> ::= if <boolean_expression> then <statement_sequence> [ else <statement_sequence> ] end if ; • repetitive items (zero or more times) are enclosed in meta symbols { and }, example: <identifier> ::= <letter> { <letter> | <digit> } • terminals of only one character are surrounded by quotes (") to distinguish them from meta-symbols, example: <statement_sequence> ::= <statement> { ";" <statement> } • In recent text books, terminal and non-terminal symbols are distingue by using bold faces for terminals and suppressing < and > around non-terminals. This improves greatly the readability. • The example then becomes: • if_statement ::= if boolean_expression then • statement_sequence • [else • statement_sequence ] • endif ";"

  43. More Useful Grammar

  44. Derivation: x – 2 * y

  45. Derivation • Such a process of rewrites is called a derivation. • Process or discovering a derivations is called parsing • At each step, we choose a non-terminal to replace • Different choices can lead to different derivations. • Two derivations are of interest • Leftmost derivation • Rightmost derivation

  46. Derivations • Leftmost derivation: replace leftmost non-terminal (NT) at each step • Rightmost derivation: replace rightmost NT at each step • The example on the preceding slides was leftmost derivation • There is also a rightmost derivation

  47. Rightmost Derivation

  48. Derivations • The two derivations produce different parse trees. • The parse trees imply different evaluation orders!

  49. op op Parse Trees G Leftmost derivation E E E – x E E evaluation order x – ( 2 * y ) 2 y *

  50. op op Parse Trees G Rightmost derivation E E E E E y * evaluation order (x – 2 ) * y – x 2

More Related