260 likes | 535 Views
Overview: To study the design and operation of compiler for high-level programming languages. Contents Basic compiler (one-pass compiler) functions Machine-dependent extension : (object-code generation & code optimization)
E N D
Overview: To study the design and operation of compiler for high-level programming languages. Contents Basic compiler (one-pass compiler) functions Machine-dependent extension: (object-code generation & code optimization) Compiler design alternative: multi-pass compiler, interpreters, p-code compilers & compiler-compilers. Chapter V: Compiler Compiler
Example Basic compiler functions Compiler
Basic compiler functions (cont.) • Source program • Regard each statement as a sequence of token. • The task of scanning the source statement, recognizing and classifying the various tokens, is known as lexical analysis. (scanner) • Recognized all tokens as some language construct by the grammar. • This process is called syntactic analysis or parsing. (parser) • Generation of object code. Compiler
Compilation process • Scanning (lexical analysis) • Parsing (syntactic analysis) • Code generation • Ps. It can achieve in a single pass ! Compiler
Grammars • A grammar for a programming language is a formal description of the syntax, of programs and individual statements written in the language. • The difference between syntax and semantics, • E.g., I := J + K X := Y + I where X,Y : Real I,J,K : IntegerThey are identical syntax.However, the semantic are quite different. Compiler
Grammars (cont.) • BNF (Backus-Naur Form) • A kind of syntax description. • Simple. • Widely used. • It provide capabilities that are sufficient for most purposes. • BNF consists of a set of rules, each of which defines the syntax of some construct in the programming language. • E.g., <read> ::= READ ( <id-list>) Compiler
Grammars (cont.) • <read> ::= READ ( <id-list>) • <id-list> ::= id | <id-list>, id • Character strings enclosed between < and > are called nonterminal symbol. • Character strings not enclosed between < and > are called terminal symbol (I.e, tokens). • E.g., READ(value, sum, x, y) Compiler
Simplified Pascal grammar Compiler
Simplified Pascal grammar (cont.) Compiler
Simplified Pascal grammar (cont.) • To display the analysis of a source statement in terms of a grammar a a tree (parse tree or syntax tree). Compiler
The parse tree for VARIANCE := SUMSQ DIV 100 – MEAN * MEAN Compiler
Grammars (cont.) • Draw parse tree for • ALPHA – BETA * GAMMA • If there is more than one possible parse tree for a given statement, the grammar is said to be ambiguous. • The ambiguous grammar would leave doubt about what object code should be generated. Compiler
Lexical analysis (scanning) • Scanning the program to be compiled and recognizing the tokens that make up the source statements. • Scanner are usually designed to recognize keywords, operators, and identifiers, integer, floating-point numbers, character strings, …,etc. • The identifier might be defined by the rules: • <ident> ::= <letter> | <ident> <letter> | <ident> <digit> • <letter> ::= A | B | C | D | … | Z • <digit> ::= 0 | 1 | 2 | 3 | … | 9 Compiler
Token coding scheme Compiler
Lexical scan Compiler
The lexical scanning • It must deal with the following cases: • For example, • DO 10 I = 1, 100 • DO 10 I =1 • (FORTRAN ignores blank in the statement) • IF (THEN .EQ. ELSE) THEN IF = THENELSE THEN = IFENDIF • A number of tools have been developed for automatically constructing lexical scanners from specifications stated in a special-purpose language. Compiler
Modeling Scanners as Finite Automata • The tokens of most programming languages can be recognized by a finite automation. • Starting state vs. final state. • If the automation stops in a final state, we say that it recognizes (or accept) the string being scanned, otherwise, it fails to recognize the string. Compiler
The implementation of finite automata • Using algorithm code (for Fig. 5.8 (b)) Compiler
The implementation of finite automata (cont.) • Using tabular representation Compiler