300 likes | 491 Views
Chapter 7. Introduction to Languages and Compiler. Contents. Computer architecture Compiler Grammars Formal languages Parse trees Ambiguity Regular expressions. Von Neumann Architecture. Compiler. A compiler is a program that reads a program written in one
E N D
Chapter 7 Introduction to Languages and Compiler SEG2101 Chapter 7
Contents • Computer architecture • Compiler • Grammars • Formal languages • Parse trees • Ambiguity • Regular expressions SEG2101 Chapter 7
Von Neumann Architecture SEG2101 Chapter 7
Compiler A compiler is a program that reads a program written in one language – the source language – and translates it into an equivalent program in another language – the target language. SEG2101 Chapter 7
The Compilation process SEG2101 Chapter 7
Grammars • A grammar is defined as a 4-tuple: the alphabet , the nonterminals N, the production P, and a goal symbol S. • (, N, P, S) • , N, P are set, S is a particular element of set N. SEG2101 Chapter 7
Alphabets and Strings • is the alphabet, or set of terminals. • It is a finite set consisting of all the input characters or symbols that can be arranged to form sentences in the language. • English: A to Z, in our definition, punctuation and space symbols • Programming language: usually some well-defined computer set such as ASCII SEG2101 Chapter 7
Alphabets and Strings (II) • A compiler is usually defined with 2 grammars. • The alphabet for the scanner grammar is ASCII or some subset of it. • The alphabet for the parse grammar is the set of tokens generated by the scanner, not ASCII at all. SEG2101 Chapter 7
An Example of Strings • ={a,b,c,d} • Possible strings of terminals from include aaa, aabbccdd, d, cba, abab, ccccccccccacccc, and so on. SEG2101 Chapter 7
Formal Languages • : alphabet, it is a finite set consisting of all input characters or symbols. • *: closure of the alphabet, the set of all possible strings in , including the empty string . • A (formal) language is some specified subset of *. SEG2101 Chapter 7
Nonterminals • Nonterninal set N is a finite set of symbols not in the alphabet. • A particular nonterminal, the goal symbol S, represents exactly all the strings in the language. • The goal symbol is also often called the start symbol because we start with it. • The set of terminal and set of nonterminals, taken together, is called vocabulary of the grammar. SEG2101 Chapter 7
Productions • The productions P of a grammar is a set of rewriting rules, each written as two strings of symbols separated by an arrow. • The symbols on each side of the arrow may be drawn from both terminals and nonterminals, subject to certain restrictions in the form of the grammars. SEG2101 Chapter 7
An Example Grammar • G1=({a,b,c}, {A,B}, {AaB, AbB, AcB, B a, B b, B c}, A) • The grammar generates 9 two-letter strings. SEG2101 Chapter 7
Syntax and Semantics • Syntax: a syntax of a programming language is the form of its expression, statements, and program units. • Semantics: the meaning of those expression, statements, and program units. • If (<expr>) <statement> SEG2101 Chapter 7
Sentences, Lexeme, Token • Sentences: the strings of a language are called sentences or statements. • Lexeme: the lexemes of a programming language include its identifier, literals, operators, and special words. • Token: a token of a language is a category of its lexemes. SEG2101 Chapter 7
Lexeme and Token Index = 2 * count +17; SEG2101 Chapter 7
The Role of Grammars • The grammar of a language defines the correct form for sentences in that language. • Grammar is the formal language generation mechanism that are commonly used to describe the syntax of programming languages. SEG2101 Chapter 7
BNF: Backus-Naur Form • Backus presented a new formal notation for specifying programming language syntax. • Naur modified the notation slightly. • Known as Backus-Naur Form, or BNF. • BNF is a very natural notation for describing syntax. • BNF and context-free grammar (grammar) are used interchangeably. SEG2101 Chapter 7
BNF • Metalanguage: A language used to describe another language. BNF is a metalanguage for programming language. • Abstraction: the symbol on the left-hand of the arrow • Definition: the text to the right of the arrow • Rule (production): altogether the description is called rule. SEG2101 Chapter 7
BNF Description(A simple C assignment statement) SEG2101 Chapter 7
Nonterminal and Terminal • Nonterminal symbol: the abstraction in a BNF description or grammar • Terminal symbol: the lexemes and tokens of the rules • A BNF description or grammar is simply a collection of rules. • Nonterminals can have two or more distinct definitions. • Multiple definitions can be written as a single rule, with the different definitions separated by |, meaning logical OR. <if_stmt>if <logic_expr>then<stmt> |if <logic_expr>then<stmt>else<stmt> SEG2101 Chapter 7
List of Syntactic Elements • BNF does not include ellipsis (…) • BNF uses recursion • A rule is recursive if its LHS appears in its RHS. • e.g., <ident_list> identifier | identifier , <ident_list> SEG2101 Chapter 7
A Grammar SEG2101 Chapter 7
A Derivation of a Program SEG2101 Chapter 7
Another Grammar SEG2101 Chapter 7
A Derivation of a Statement SEG2101 Chapter 7
Parse Tree Grammars naturally describe the hierarchical syntactic structure of the sentences of the languages they define. These hierarchical structures are called parse trees. SEG2101 Chapter 7
Ambiguous Grammar • A grammar that generates a sentence for which there are two or more distinct parse trees is said to be ambiguous. SEG2101 Chapter 7
Ambiguity SEG2101 Chapter 7
Regular Expressions Regular expression is a method of describing string. SEG2101 Chapter 7