370 likes | 588 Views
Course: ICS313 Fundamentals of Programming Languages. Instructor: Abdul Wahid Wali a.wali@uoh.edu.sa Lecturer, College of Computer Science and Engineering University of Hail. Chapter 3: Describing Syntax and Semantics. Objectives:. - Introduction
E N D
Course: ICS313 • Fundamentals of Programming Languages. • Instructor: • Abdul Wahid Wali • a.wali@uoh.edu.sa • Lecturer, • College of Computer Science and Engineering • University of Hail
Chapter 3: Describing Syntax and Semantics Objectives: - Introduction - The General Problem of Describing Syntax - Formal Methods of Describing Syntax - Attribute Grammars - Describing the Meanings of Programs: Dynamic Semantics
Introduction • A language is a set of sentences or statements. Sentences or statements are the valid strings of a language. It consists of valid alphabets sequenced in a way that is consistent with the grammar of the language. • Who must use language definitions? • Language designers, • Implementers • - Programmers (the users of the language) • - A formal description of the language is essential in learning, writing, and implementing the language. Thus description must be precise and understandable.
- Languages are described by their syntaxes and semantics What is the meaning of Syntax? Its means, the form or structure of the expressions, statements and program units. The Syntax rules of a language specify which strings of characters from the language’s alphabet are in the language. • What is the meaning of Semantics? • The meaning of the expressions, statements, and program units. - Syntax describes what the language looks like. Semantics determines what a particular construct actually does in a formal way. - Syntax is much easier to describe than semantics.
3.2The General Problem of Describing Syntax: : Terminology • Formal descriptions of the syntax of programming languages, for simplicity sake, often do not include descriptions of the lowest-level syntactic units. These small units called lexemes. • A lexeme is basic component during lexical analysis. A lexeme consists of related alphabet from the language. (e.g. numeric literals, operators, and special words). • Lexemes are partitioned into groups, such as, the names of variables, methods, classes, and so forth, each of these groups is represented by a name, of a token. • A token is a category of related lexemes, and a lexeme is an instance of a token. (e.g., identifier)
example: consider the following Java statement • index = 2 * count +17; • The lexemes and tokens of this statement are represented the coming table In general, languages can be formally defined in two distinct ways: A language can be (1) generated or (2) recognized.
3.2.1 Language Recognizer A recognizer of a language identifies those strings that are within a language from those that are not. The lexical analyzer and the parser of a compiler are the recognizer of the language the compiler translates. The lexical analyzer recognizes tokens, and the parser recognizes the syntactic structure. 3.2.2 Language Generators A generator generates the valid sentences of a language. In some cases it is more useful than the recognizer since we can “watch and learn”.
3.3 Formal Methods of Describing Syntax The formal language that is used to describe the syntax of programming languages is called grammar. 3.3.1 BNF (Backus-Naur Form) and Context-Free Grammars BNF is widely accepted way to describe the syntax of a programming language. 3.3.1.1 Context-Free Grammars Regular expression and context-free grammar are useful in describing the syntax of a programming language. Regular expression describes how a token is made of alphabets, and context-free grammar determines how the tokens are put together as a valid sentence in the grammar.
3.3 Formal Methods of Describing Syntax 3.3.1.2 Origins of Backus- Naur From (BNF) John Backus and Peter Naur used BNF to describe ALGOL 58 and ALGOL 60, and BNF is nearly identical to context-free grammar. • 3.3.1.3 BNF Fundamentals • BNF is a metalanguage, i.e., a language that describes a language, which can describe a programming language. • The BNF consists of rules (or productions). A rule has a left-hand side (LHS) as the abstraction, and a right-hand side (RHS) as the definition. As in java assignment statement the defintion could be as follows: • <assign> → <var> = <expression>
The LHS is a non-terminal and the RHS called terminal. • The rule indicates that whenever you see a token on the LHS, you can replace it with the RHS, just like expanding a non-terminal into its children in a tree. 3.3.1.4 Describing Lists Recursion is used in BNF to describe lists. 3.3.1.5 Grammars and Derivations - BNF is a generator of the language. The sentences of a language can be generated from the start symbol by applying a series of rules on it. The process of starting from the start symbol to the final sentence is called a derivation.
- Replacing a non-terminal with different RHS’s may derive different sentences. - Each sentence during a derivation is a sentential form. If we replace every leftmost non-terminal of the sentential form, the derivation is leftmost. However, the set of sentences generated are not affected by the derivation order.
Describing Syntax and Semantics 3.3.1.5 Grammars and Derivations - BNF is a generator of the language. The sentences of a language can be generated from the start symbol by applying a series of rules on it. The process of starting from the start symbol to the final sentence is called a derivation. Example (3.1) a grammar for Small language <program> → begin <stmt_list> end <stmt_list> → <stmt> | <stmt>; <stmt_list> <stmt> → <var>= <expression> <var> → A| B|C <expression> → <var + var> | <var> - <var> | <var>
A derivation of a program in this language follows: <program> => begin <stmt_list> end => begin <stmt> ; <stmt_list> end => begin <var> = <expression>; <stmt_list> end => begin A= <expression>; <stmt_list> end => begin A = <var> + < var> ; <stmt_list> end => begin A = B+ < var> ; <stmt_list> end => begin A = B + C; <stmt> end => begin A = B + C ; <var> = <expression> end => begin A = B + C ; B = <expression> end => begin A = B + C ; B = C end - Replacing a non-terminal with different RHSs may derive different sentences. - Each sentence during a derivation is a sentential form. If we replace every leftmost non-terminal of the sentential form, the derivation is leftmost. However, the set of sentences generated are not affected by the derivation order.
Example (3.2) a grammar for a Simple Assignment Statements <assign> → <id> = <expr> <id> → A| B| C | <expr> → <id> + <expr> | <id>*<expr> | (<expr>) | <id> A = B*(A+C) is generated by the leftmost derivation: <assign> => <id> = <expr> => A = <expr> => A = <id> * <expr> => A = B * <expr> => A = B * (<expr>) => A = B * (<id> + <expr>) => A = B * (A + <expr>) => A = B * (A + <id>) => A = B * (A + C)
3.3.1.6 Parse Trees: A derivation can be represented by a tree hierarchy called parse tree. The root of the tree is the start symbol and applying a derivation rule corresponds to expand a non-terminal in a tree into its children. A parse tree for the simple statement: A = B * (A + C) <assign> <id> <expr> = <id> <expr> * A ( ) <expr> B <id> <expr> + <id> A C
Formal Method of Describing Syntax (contd.) 3.3.1.7 Ambiguity: A grammar is ambiguous if for a given sentence, there is more than one parser tree, i.e., there are two derivations that lead to the same sentence. Two distinct parse trees for the same sentence, A = B +A * C <assign> <assign> <id> <expr> = <id> <expr> = <expr> <expr> A + <expr> <expr> A * <expr> <expr> * <expr> <expr> <id> + <id> <id> <id> <id> <id> B A C B A C
Associativity of OperatorsOperator associativity can also be indicated by a grammar 3.3.1.8 Operator Precedence: Operator Precedence can be maintained by modifying a grammar so that operators with higher precedence are grouped earlier with its operands, so that they appear lower in the parse tree. <expr> => <expr> + <expr> | const (ambiguous) <expr> => <expr> + const | const (unambiguous) <expr> <expr> <const> + <expr> <const> + <const>
3.3.2 Extended EBNF - EBNF has the same expression power as BNF - The addition includes optional constructs (parts), repetition, and multiple choices, very much like regular expression. - Optional parts are placed in brackets ([ ]) <proc_call> → ident [ ( <expr_list>)] - Alternative parts of RHSs in parentheses and separate them with vertical bars <term> → <term> (+ | -) const - Put repetitions (0 or more) are placed inside braces ({}) <ident> → letter {letter | digit}
3.3.2 Extended EBNF BNF <expr> <expr> + <term> | <expr> - <term> | <term> <term> <term> * <factor> | <term> / <factor> | <factor> EBNF <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>}
3.3.4 Attribute Grammars An attribute grammar is an extension to a context –free grammar . It is used to describe more of the structure of a programming language than can be described with context-free grammar. The extension allows certain language rules to be described, such as type compatibility. 3.3.4.1 Static Semantics - There are many restrictions on programming languages that are either difficult or impossible to describe in BNF, however, they can be described when we add attributes to the terminal/non-terminals in BNF. - These added attribute and their computation could be computed at compile-time, thus the name static semantics.
3.3.4 Attribute Grammars 3.3.4.1 Static Semantics (contd.) - Consider a rule which states that a variable must be declared before it is referenced. -- Cannot be specified in a context-free grammar. -- Can be tested at compile time. - Some rules can be specified in the grammar of a language, but will unnecessarily complicate the grammar. e.g. a rule in JAVA that states that a string literal cannot be assigned to a variable which was declared to be type int.
3.3.4 Attribute Grammars 3.3.4. 2 Basic concepts An attribute grammar consists of, in addition to its BNF rules, the attributes of grammar symbols, a set of attribute computation functions (or semantic functions), and predicate functions that determine whether a rule could be applied. The latter two functions are associated with the grammar rules. Thus basic concepts are: - Attribute grammars are grammars to which have been added attributes, attribute computation functionsand predicate functions. - Attributes associate with grammar symbols, are similar to variables in the sense that they can have values assigned to them.
- Attribute computation functions semantic functions, are associated with grammar rules. They are used to specify how attribute values are computed. • - Predicate function which state some of the syntax and static semantic rules of the language are associated with grammar rule.
3.3.4. 3 Attribute grammars defined Definition: An attribute grammar is a context-free grammar (CFG) with the following additions: - For each grammar symbol x there is a set A(x) of attribute values. • Each rule has a set of functions that define certain attributes of the non-terminals in the rule. - Each rule has a set of predicates to check for attribute consistency. • Associate with each grammar symbol X is a set of attributes A(X). • - A(X) consists of two disjoint sets S(X) and I(X), called synthesized and inherited attributes
Attribute Grammars (continued) - Synthesized attributed are used to pass semantic information up a parse tree. - Inheritedattributes pass semantic information down a parse tree. - Let X0 X1 ……………… Xn be a rule - Function of the form I (Xn) = f (A(X0), …A (Xn)) define synthesized attributes. - Functions of the form I(Xj) = f(A(X0), ... , A (Xn)), for i <= j <= n, define inherited attributes
Attribute Grammars (continued) • Example: expressions of the form id + id • id's can be either int_type or real_type • types of the two id's must be the same • type of the expression must match it's expected type • BNF • <expr> <var> + <var> • <var> id • Attributes: • Actual_type: A synthesized attribute associated with the non-terminals for <var> and <expr> • Exceted_type: An inherited attribute associated with the non-terminal <expr>
An attribute grammar for simple assignment statements 1.Syntax rule: <assign> <var> = <expr> Semantic rule: <expr>.expected_type <var>.actual_type 2. Syntax rule: <expr> <var>[2] + <var>[3] Semantic rule: <expr>.actual_type if (<var>[2].actual_type =int) and (<var>[3].actual_type =int) then int else real end if Predicate: <expr>.actual_type = <expr>.expected_type
An attribute grammar for simple assignment statements 3.Syntax rule: <expr> <var> Semantic rule: <expr>.actual_type <var>.actual_type Predicate: <expr>.actual_type = <expr>.expected_type 4. Syntax rule: <var> A | B | C Semantic rule: <var>.actual_type look-up (<var>.string) The look-up function looks up a given name in symbol table and returns the variable’s type.
Describing the meaning of Programs: Dynamic Semantics • - There is no single widely acceptable notation or formalism for describing semantics • - The dynamic semantics of a program is the meaning of its expressions, statements, and program units. Dynamic semantic determines the meanings of programming constructs during their execution. • - Accurately describing semantics is essential so that... • - ...users writing a program can precisely understand how the various language constructs work. • - ...compilers will be implemented consistently with respect to one another. • - Three methods that are used to describe semantics formally: • - Operational semantics • - Axiomatic semantics • -Denotational semantics
1- Operational Semantics • - Describe the meaning of a program by executing its statements on a machine, either simulated or actual. The change in the state of the machine (memory, registers, etc.) defines the meaning of the statement • - The for loop in C++ or Java... • for (expr1; expr2; expr3) { stmt}
2- Axiomatic Semantics • - Axiomatic semantics defined in conjunction with the development of a method to prove the correctness of a program. • - Axiomatic semantics is based on mathematical logic. The logical expressions are called predicated, or assertions. • - An assertion immediately following a statement describes a new constraints on those variables after execution of the statement. • - These assertions are called the precondition and post condition. • - Developing an axiomatic description or proof of a given program requires that every statement in the program have both a precondition and a post condition.
- Based on formal logic (first order predicate calculus) • - Original purpose: formal program verification • - Approach: Define axioms or inference rules for each statement type in the language (to allow transformations of expressions to other expressions) • - The expressions are called assertions • An assertion before a statement (a precondition) states the relationships and constraints among variables that are true at that point in execution • - An assertion following a statement is a post-condition
Example : Show that the program segment • y := 2 • z := x + y , • is correct with respect to the initial assertion {P} : x = 1 and the final assertion • {Q}: z = 3. • Solution: Suppose that p is true, so that x=1 as the program begins. • Then y is assigned 2 and z is assigned the sum of x and y, which is 3. Hence program segment is correct.. • x y z • 1 2 1 + 2 = 3 , thus : {P} statement {Q} is true.
3 – Denotational Semantics - is the most rigorous, widely known method for describing the meaning of programs • - Based on recursive function theory • - The most abstract semantics description method • Originally developed by Scott and Strachey (1970) • - The process of building a denotational specification for a language (not necessarily easy): • Define a mathematical object for each language entity • Define a function that maps instances of the language entities onto instances of the corresponding mathematical objects • - The meaning of language constructs are defined by only the values of the program's variables
The difference between denotational and operational semantics: • In operational semantics, the state changes are defined by coded algorithms; in denotational semantics, they are defined by rigorous mathematical functions • The state of a program is the values of all its current variables • s = {<i1, v1>, <i2, v2>, …, <in, vn>} • - Let VARMAP be a function that, when given a variable name and a state, returns the current value of the variable • VARMAP(ij, s) = vj
(e.g.) Decimal Numbers <dec_num> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | <dec_num> ('0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9') Mdec('0') = 0, Mdec ('1') = 1, …, Mdec ('9') = 9 Mdec (<dec_num> '0') = 10 * Mdec (<dec_num>) Mdec (<dec_num> '1’) = 10 * Mdec (<dec_num>) + 1 … Mdec (<dec_num> '9') = 10 * Mdec (<dec_num>) + 9
Summary • BNF and context-free grammars are equivalent meta-languages • Well-suited for describing the syntax of programming languages • An attribute grammar is a descriptive formalism that can describe both the syntax and the semantics of a language • Three primary methods of semantics description • Operation, axiomatic, denotational • ------------------------------------END------------------------------------------------