Principles of Programming Languages

Principles of Programming Languages P. S. Suryateja Asst. Professor, CSE Dept Vishnu Institute of Technology

UNIT – 1SYNTAX & SEMANTICS

General Problem of Describing Syntax • A language is a set of strings of characters from some alphabet. • The strings of a language are called as sentences or statements. • The syntax rules specify which strings belong to the language. • Lowest level syntactic units are known as lexemes. • Lexemes of a programming language include numeric literals, operators, special words etc...

General Problem of Describing Syntax (cont...) • Lexemes are partitioned into groups like identifiers, keywords, literals etc. • A token of a language is a category of its lexemes.

General Problem of Describing Syntax (cont...) • Consider the following statement: index = 2 * count + 17; LexemesTokens index identifier = equal_sign 2 int_literal * mult_op count identifier + plus_op 17 int_literal ; semicolon

Language Recognizers • A language can be defined in two ways: by recognition and by generation. • For a language L that uses an alphabet Σ of characters, we need to construct a mechanism R, called a recognition device. • The recognition device would indicate whether the string formed with characters from alphabet is in the language L or not. • The syntax analysis part of a compiler is a recognizer for the language the compiler translates.

Language Generators • A generator is a device used to generate the sentences of a language. • Generator is a device of limited usefulness as a language descriptor as the sentence generated by a generator is unpredictable. • Example for language recognizer is a Finite State Automata (FSA) and example for language generator is CFG.

Formal Methods of Describing Syntax – Context-Free Grammars • Two of the four Chomsky’s classes of grammars namely regular grammars and context-free grammars are used to describe the syntax of programming languages. • Regular grammars are for describing tokens. • Context-free grammars are for describing the syntax of whole programming languages.

Formal Methods of Describing Syntax – Backus-Naur Form (BNF) • John Backus presented a paper describing ALGOL 58 which introduced a new formal notation for specifying programming language syntax. • Later Peter Naur slightly modified the notation proposed by Backus for ALGOL 60. This revised notation is called as Backus-Naur Form (BNF).

BNF - Fundamentals • A meta-language is a language that is used to describe another language. BNF is a meta-language for programming languages. • BNF uses abstractions for syntactic structures. Abstraction names are enclosed with angular brackets (< >). For example, the abstraction for an assignment statement can be <assign> and its definition is as follows: <assign> -> <var> = <expression> The text on the left side of the arrow is called left-hand side (LHS), is the abstraction being defined. The text to the right of the arrow is called as right-hand side (RHS), which is the definition of LHS and can contain a mixture of tokens, lexemes or other abstractions.

BNF – Fundamentals (cont...) • The LHS and RHS combined is called a rule or production. • Example for the <assign> definition: total = s1 + s2 • The abstractions in a BNF description, or a grammar, are often called as non-terminals and the lexemes and tokens of the rules are called terminals. • A BNF description or a grammar is a collection of rules.

BNF – Fundamentals (cont...) • A Java if statement can be described with the following rules: <if_stmt> -> if (<logic_expr>) <stmt> <if_stmt> -> if (<logic_expr>) <stmt> else <stmt> Above two rules can be combined as follows: <if_stmt> -> if (<logic_expr>) <stmt> | if (<logic_expr>) <stmt> else <stmt>

BNF – Fundamentals (cont...) • BNF does not contain ellipsis (...) to represent variable-length lists. Instead it uses recursion in the rules. • A rule is said to be recursive if the LHS appears in its RHS as shown below: <iden_list> -> identifier | identifier, <iden_list>

Grammars and Derivations • A grammar is a generative device for defining languages. • Sentences are generated through a sequence of application of the rules, beginning with a special non-terminal symbol known as start symbol. • The sequence of rule applications is called a derivation. • For a programming language, the start symbol often refers the entire program and is denoted as <program>.

Grammars and Derivations (cont...) Adopted from Concepts of Programming Languages - Sebesta

Grammars and Derivations (cont...) • A derivation of a program is as follows: Adopted from Concepts of Programming Languages - Sebesta

Grammars and Derivations (cont...) • The symbol => is read as “derives”. • Each of the strings in the derivation, including <program>, is called a sentential form. • Derivations in which always the left most non-terminals are replaced are known as leftmost derivations. • The sentential form consisting of only terminals, or lexemes, is the generated sentence.

Parse Trees • Grammars naturally describe the hierarchical structure of sentences. These hierarchical structures are known as parse trees. • Every internal node in a parse tree is a non-terminal symbol. • Every leaf node is a terminal symbol. • Every sub-tree describes one instance of an abstraction in the sentence.

Parse Trees (cont...) Adopted from Concepts of Programming Languages - Sebesta

Ambiguity • A grammar is said to be ambiguous if a string derived by using the grammar has more than one parse tree. Adopted from Concepts of Programming Languages - Sebesta

Ambiguity (cont...) Parse trees for the string A = B + C * A Adopted from Concepts of Programming Languages - Sebesta

Operator Precedence • The mechanism which allows the implementation to choose one operator among several operators for evaluation is know as operator precedence. • Ambiguous grammars makes it difficult to choose one operator over another. • General rule is to execute the operator which is lower in the parse tree.

Operator Precedence (cont...) Parse trees for the string A = B + C * A Adopted from Concepts of Programming Languages - Sebesta In one parse tree * is lower and in another + is lower. Which one to choose?

Operator Precedence (cont...) • Correct ordering is specified by using separate non-terminals to represent the operands of operators that require different precedence. • Previous grammar can be re-written (unambiguous) as follows: Adopted from Concepts of Programming Languages - Sebesta

Operator Precedence (cont...) Adopted from Concepts of Programming Languages - Sebesta

Associativity • The semantic rule which specifies the precedence in case of same level operators is known as associativity. • If the LHS of a rule appears first in its RHS, such grammar is said to be left recursive. Adopted from Concepts of Programming Languages - Sebesta

Associativity (cont...) • If the LHS of a rule appears last in its RHS, such grammar is said to be right recursive. • Left recursion supports left associativity and right recursion supports right associativity.

Extended BNF (EBNF) • Due to shortcomings in BNF, it was extended. The extended version is known as Extended BNF or simply EBNF. • Three extensions are commonly included in the various versions of EBNF. • First extension is denoting a optional part in the RHS using square brackets. Ex: <if_stmt> -> if (<expr>) <stmt> [ else <stmt> ]

Extended BNF (EBNF) (cont...) • Second extension is the use of braces in the RHS to indicate that the enclosed part can be repeated indefinitely. Ex: <iden_list> -> <identifier> {, <identifier> }

Extended BNF (EBNF) (cont...) • Third extension deals with multiple-choice options using the parentheses and OR operator, |. Ex: <term> -> <term> (* | / | % ) <factor> • The brackets, braces and parentheses are known as metasymbols.

Extended BNF (EBNF) (cont...) Adopted from Concepts of Programming Languages - Sebesta

Attribute Grammars • An attribute grammar is used to describe more about the structure of a programming language. • Attribute grammar is an extension to a CFG. • Attribute grammar allows certain language rules like type compatibility to be conveniently described.

Attribute Grammars – Static Semantics • Some characteristics of the programming languages like type compatibility cannot be specified using BNF. • A syntax rule that cannot be specified using BNF is, all variables must be declared before their usage. • These are examples of static semantic rules. Static semantics can be checked at compile time. • Attribute grammar is one of the alternatives for describing static semantics. It was designed by Knuth.

Attribute Grammars – Basic Concepts • Attribute grammars are CFGs along with attributes, attribute computation functions and predicate functions. • Attributes are associated with grammar symbols (terminals and non-terminals) and are similar to variables. • Attribute Computation Functions are associated with grammar rules.They are used to specify how attribute values are computed. • Predicate functions, which state the static semantic rules, are associated with grammar rules.

Attribute Grammars – Definition • Associated with each grammar symbol X is a set of attributes A(X). • The set A(X) contains two disjoint sets S(X) and I(X), called synthesized attributes and inherited attributes. • Synthesized Attributes are used to pass semantic information up the parse tree. • Inherited Attributes pass semantic information down and across a tree

Attribute Grammars – Definition (cont...) • Associated with each grammar rule is a set of semantic functions. • For a rule X0 -> X1....Xn , the synthesized attributes of X0 are computed with semantic functions of the form S(X0) = f(A(X1),...,A(Xn)). So the value of a synthesized attribute on a node only depends upon the values of the attributes of that node’s child nodes. • Inherited attributes of symbols Xj, 1<=j<=n, are computed with a semantic function of the form I(Xj) = f(A(X0),.....,A(Xn)). So the value of inherited attribute on a node depends on attribute values of that node’s parent node and those of its sibling nodes.

Attribute Grammars – Definition (cont...) • A predicate function has the form of a Boolean expression on the union of the attribute set {A(X0),....,A(Xn)} and a set of literal attribute values. • The only derivations allowed with an attribute grammar are those in which every predicate associated with every non-terminal is true.

Intrinsic Attributes • Intrinsic attributes are synthesized attributes of leaf nodes whose values are determined outside the parse tree (ex: type of a variable from symbol table). • Given the intrinsic attribute values on a parse tree, the semantic functions can be used to compute remaining attribute values.

Attribute Grammar – Example 1 Adopted from Concepts of Programming Languages - Sebesta Attribute grammar that describes the rule that the name on the end of an Ada procedure must match the procedure’s name. (This rule cannot be stated using BNF). Note: Numbers represented as subscripts are used to denote the instances of an abstraction.

Attribute Grammar – Example 2 actual_type: Synthesized Attribute expected_type: Inherited Attribute Adopted from Concepts of Programming Languages - Sebesta

Attribute Grammar – Example 2 (cont...) Adopted from Concepts of Programming Languages - Sebesta

Dynamic Semantics • Dynamic semantics deals with meaning of the expressions, statements and program units. • No universally accepted notation or approach has been devised for dynamic semantics.

Operational Semantics • Operational semantics specifies the meaning of a program in terms of its implementation on a real or virtual machine. • Change in the state of the machine defines the meaning of the statement. • To use operational semantics for a high-level language, a virtual machine is needed. • Highest level operational semantics is known as natural operational semantics and lowest level is known as structural operational semantics.

Operational Semantics - Ex

Operational Semantics - Evaluation • Advantages: • May be simple for small examples • Good if used informally • Useful for implementation • Disadvantages: • Very complex for large programs • Lacks mathematical rigor • Uses: • Vienna Definition Language (VDL) used to define PL/I • Compiler work

Denotational Semantics • A formal method for specifying the meaning of programs. Denotational semantics is based on recursive function theory. • Key idea is to define a function that maps a program (a syntactic object) to its meaning (a semantic object). • The domain of the mapping function is called the syntactic domain and the range is called semantic domain. • The method is named denotational because the mathematical objects denote the meaning of their corresponding entities.

Denotational vs. Operational • Denotational semantics is similar to operational semantics except: • There is no virtual machine • Language is mathematics (lambda calculus) • Difference between denotational and operational semantics: • In operational semantics, the state changes are defined by coded algorithms for a virtual machine • In denotational semantics, they are defined by rigorous mathematical functions

Principles of Programming Languages