1 / 143

Chapter 3

Chapter 3. Describing Syntax and Semantics. Syntax and Semantics. Syntax of a programming language : the form of its expressions, statements, and program units Semantics: the meaning of the expressions, statements, and program units Syntax and semantics provide a language’s definition.

Download Presentation

Chapter 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3 Describing Syntax and Semantics

  2. Syntax and Semantics • Syntax of a programming language: the form of its expressions, statements, and program units • Semantics: the meaning of the expressions, statements, and program units • Syntax and semantics provide a language’s definition

  3. An Example of Syntax and Semantic • The syntax of a Java while statement: while (<boolean_expr>) <statement> • The semantic of the above statement: when the current value of the Boolean expression is true, the embedded statement is executed; otherwise, control continues after the while construct. Then control implicitly returns to the Boolean expression to repeat the process

  4. Description Easiness • Describing syntax is easier than describing semantics • partly because a concise and universally accepted notation is available for syntax description • but none has yet been developed for semantics

  5. character string character string character string character string character string character string Definitions of Languages and Sentences • A language, whether natural (such as English) or artificial (such as Java), is a set of strings of characters from some alphabet. • The strings of a language are called sentences or statements. language

  6. Syntax Rules of a Language • The syntax rules of a language specify which strings of characters from the language’s alphabet are in the language.

  7. Definition of Literal [Word Origin & History] • A letter or symbol that stands for itself as opposed to a feature or function in a programming language: • Example: • $ can be a symbol that refers to the end of a line, but as a literal, it is a dollar sign.

  8. Definitions of Lexemes • The lexeme of a programming language include its • numeric literals • operators • and special words, among others. • (e.g., *, sum, begin) • Alexemeis the lowest level syntactic unit of a language: • for simplicity’s sake, formal descriptions of the syntax of programming languages often do not include description of them • However the description of lexemes can be given by a lexical specification which is usually separate from the syntactic description of the language.

  9. An Informal Definition of a Program • A program could be thought as a string of lexemes. (P.S.: Ignore the definition of the text book.) … lexeme lexeme lexeme … lexeme lexeme lexeme lexeme string … lexeme lexeme lexeme … lexeme lexeme lexeme : lexeme lexeme lexeme lexeme lexeme lexeme program

  10. Lexeme Group • Lexemes are partitioned into groups. • For example, in a programming language, the names of variables, methods, classes, and so forth form a group called identifier. • Each of these groups is represented by a name, or token. lexeme lexeme : lexeme lexeme token

  11. Definitions of Tokens • A tokenis a category of lexemes (e.g., identifier) • a token may have only a single lexeme • For example, the token for the arithmetic operator symbol +, which may have the name plus_op, has just one possible lexeme. • a lexeme of a token can also be called an instance of the token

  12. Example of Tokens • Identifier is a token that can have lexemes, or instances, such as sum and total.

  13. Lexemes and Tokens [csusb], [Lee] • Lexemes: a string of characters in a language that is treated as a single unit in the syntax and semantics. • For example identifiers, numbers, and operators are often lexemes • In a programming language, there are a very large number of lexemes, perhaps even an infinite number; however, there are only a small number of tokens.

  14. Examples of Lexemes and Tokens [Lee] • while (y  <=  t) y  =  y - 3 ; will be represented by the set of pairs: a token with only one lexeme

  15. Summary language program character string token lexeme character

  16. Ways to Define a Language • In general, languages can be formally defined in two distinct ways: by recognition and by generation. • P.S.: Although neither provides a definition that is practical by itself for people trying to learn or even use a programming language.

  17. Language Recognizer • a recognition device that • reads strings of characters from the alphabet of a language and • decides whether an input string was or was not in the language • is not used to enumerate all of the sentences of a language • Example: • syntax analysis part of a compiler is a recognizer for the language the compiler translates

  18. Language Generators • a device that generates the sentences of a language. • We can think of the generator as having a button that produces a sentence of the language every time it is pushed • However, the particular sentence that is produced by a generator when its button is pushed is unpredictable • One can determine if the syntax of a particular sentence is correct by comparing it to the structure of the generator • People learn a language from examples of its sentences

  19. Grammars [wiki] • In computer science and linguistics, a formal grammar, or sometimes simply grammar, is a precise description of a formal language — that is, of a set of strings. • Commonly used to describe the syntax of programming languages

  20. Formal Grammars [wiki] • A formal grammar, or sometimes simply grammar, consists of: • a finite set of terminal symbols; • a finite set of nonterminal symbols; • a finite set of production rules with a left- and a right-hand side consisting of a sequence of the above symbols • a start symbol.

  21. Grammar Example [wiki] • Nonterminals are usually represented by uppercase letters • terminals by lowercase letters • the start symbol by S. • For example, the grammar with • terminals {a,b}, • nonterminals {S,A,B}, • production rules • SABS • S ε (where ε is the empty string) • BAAB • BS b • Bb bb • Abab • Aa aa • start symbol S, • defines the language of all words of the form anbn (i.e. n copies of a followed by n copies of b).

  22. Formal Grammars and Formal Languages [wiki] • A formal grammar defines (or generates) a formal language, which is a (possibly infinite) set of sequences of terminal symbolsthat may be constructed by applying production rules to a sequence of symbols which initially contains just the start symbol.

  23. Grammar Classes [wiki] • In the mid-1950s, according to the format of the production rules, Chomsky described four classes of generative devices or grammars that define four classes of languages: • recursively enumerable • context-sensitive • context-free • regular

  24. Application of Context-free and Regular Grammars • The tokens of programming languages can be described by regular grammars. • Whole programming languages, with minor exceptions, can be described by context-free grammars.

  25. Review of Context-Free Grammars • Context-Free Grammars • Developed by Noam Chomsky in the mid-1950s • Language generators, meant to describe the syntax of natural languages • Define a class of languages called context-free languages

  26. Context-Free Grammar [McCallum] • G = <T, N, S, R> • T is a set of terminals • N is a set of non-terminals • S is a start symbol (one of the nonterminals) • R is a set of production rules of the form X → γ where X is a nonterminal and γ is a sequence of terminals and nonterminals (may be empty). • A grammar G generates a language L.

  27. Terminology - metalanguage • A metalanguage is a language that is used to describe another language.

  28. Backus-Naur Form (BNF) • Backus-Naur Form (1959) • Invented by John Backus to describe Algol 58 • Most widely known method for describing programming language syntax • BNF is equivalent to context-free grammars • BNF is a metalanguage used to describe another language

  29. Extended BNF • Extended BNF • Improves readability and writability of BNF

  30. BNF Abstraction • BNF uses abstractions for syntactic structures • For example, • A simple Javaassignment statementmight be represented by the abstraction <assign>. • (pointed brackets are often used to delimit names of abstractions) • Tokens are also abstractions.

  31. Synonym Nonterminals symbols (nonterminals) BNF abstractions Terminals Lexemes/alphabet of a language

  32. BNF Rules • A rule, also called a production, has a left-hand side (LHS) and a right-hand side (RHS), and consists of terminal and nonterminal symbols. • An abstraction is defined by a rule. • Examples of BNF rules: <ident_list> → identifier | identifier,<ident_list> <if_stmt> → if <logic_expr> then <stmt> • A grammar contains a finite nonempty set of rules.

  33. Multiple Definitions of a Nonterminal • Nonterminal symbols can have two or more distinct definitions, representing two or more possible syntactic forms in the language. • Multiple definitions can be written as a single rule, with different definitions separated by the symbol |.

  34. Example of Multiple Definitions of a Nonterminal • For example, an Adaif statement can be described with the rules: <if_stmt> if <logic_expr> then <stmt> <if_stmt> if <logic_expr> then <stmt> else <stmt> Or with the rule <if_stmt> if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt>

  35. A Rule Example <assign>  <var> = <expression> • The above rule specifies that the abstraction <assign> is defined as an instance of the abstraction <var>, followed by the lexeme =, followed by an instance of the abstraction <expression>. • One example whose syntactic structure is described by the rule is: total = subtotal1 + subtotal2

  36. Expressive Capability of BNF • Although BNF is simple, it is sufficiently powerful to describe nearly all of the syntax of programming languages. • BNF can describe • Lists of similar construct • The order in which different constructs must appear • Nested structures to any depth • Imply operator precedence • Imply operator associativity

  37. Variable-length Lists and Recursive Rules • Variable-length lists in mathematics are often written using an ellipsis (…). • 1,2, … is an example. • BNF does not include the ellipsis, it uses recursive rules as an alternative. • A rule is recursive if its LHS appears in its RHS.

  38. Example of a Recursive Rule • Syntactic lists are described using recursion <ident_list>  ident | ident, <ident_list> the above rule defines <ident_list> as either a single token (identifier) or an identifier followed by a comma followed by another instance of <ident_list>.

  39. Grammar and Derivation • A grammar is a generative device for defining languages. • The sentences of a languages are created through deviations. • A derivation is a repeated application of rules, starting with a special nonterminal of the grammar called the start symbol and ending with a sentence (all terminal symbols)

  40. An Example of Start Symbols • In a grammar for a complete programming language, the start symbol represents a complete program and is usually named <program>.

  41. Example 3.1 - An Example Grammar • What follows is a grammar for a small language. P.S.:<program> is the start symbol.

  42. Language Defined by the Grammar in Example 3.1 • The language in Example 3.1 has only one statement form: assignment. • A program consists of the special word begin, followed by a list of statements separated by semicolons, followed by the special word end. • An expression is either a single variable, or two variables separated by either a + or – operator. • The only variable names in this language are A, B, and C.

  43. An Example Derivation <program> => begin <stmt_list> end => begin <stmt> ; <stmt_list> end => begin <var> = <expression>; <stmt_list> end => begin A = <expression> ; <stmt_list> end => begin A = <var> + < var > ; <stmt_list> end => begin A = B + <var> ; <stmt_list> end => begin A = B + C ; <stmtjist> end => begin A = B + C ; <stmt> end => begin A = B + C ; <var> := <expression> end => begin A=B + C;B=<expression> end => begin A = B + C ;B= <var> end => begin A = B + C ; B = C end derive

  44. Sentential Form • In the previous slide, each successive string in the sequence is derived from the previous string by replacing one of the nonterminals with one of that nonterminal’s definitions. • Every string of symbols in the derivation is a sentential form.

  45. Order of Derivation • A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded. • The derivation continues until the sentential form contains no nonterminals. • The sentential form, consisting of only terminals, or lexemes, is the generated sentence. • Rightmost derivation • A derivation may be neither leftmost nor rightmost. • However the derivation order has no effect on the language generated by a grammar.

  46. Using the Grammar in Example 3.1 to Generate Different Sentences • By choosing alternative RHSs of rules with which to replace nonterminals in the derivation, different sentences in the language can be generated. • By exhaustively choosing all combinations of choices, the entire language can be generated. • This language, like most others, is infinite, so one cannot generate all the sentences in the language in finite time.

  47. 63 Example 3.2 64 • This grammar describes assignment statements whose right sides are arithmetic expressions with multiplication and addition operators and parentheses.

  48. The Leftmost Derivation for A=B*(A+C) <assign> => <id> = <expr> <assign> => <id> = <expr> => A = <expr> => A = <id> * <expr> => A = B * <expr> => A = B * (< expr >) => A = B * (<id> + <expr>) => A = B * (A + <expr>) => A = B * (A + <id>) => A = B * (A + C) is also a sentential form

  49. Parse Trees • One of the most attractive features of grammars is that they naturally describe the hierarchical syntactic structure of the sentences of the languages they define. • These hierarchical structures are called parse trees.

  50. 63 Figure 3.1 - a Parse Tree for A=B*(A+C) • The right side figure is a parse tree. • This parse tree shows the structure of the assignment statement derived previously.

More Related