1 / 139

Understanding Programming Language Syntax and Semantics

Learn the syntax and semantics of programming languages, the rules that define languages, lexemes and tokens, and ways to define a language formally through recognition and generation. Explore formal grammars and language generators.

butner
Download Presentation

Understanding Programming Language Syntax and Semantics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3 Describing Syntax and Semantics

  2. Syntax and Semantics • Syntax of a programming language: the form of its expressions, statements, and program units • Semantics: the meaning of the expressions, statements, and program units • Syntax and semantics provide a language’s definition

  3. An Example of Syntax and Semantic • The syntax of a Java while statement: while (<boolean_expr>) <statement> • The semantic of the above statement: when the current value of the Boolean expression is true, the embedded statement is executed; otherwise, control continues after the while construct. Then control implicitly returns to the Boolean expression to repeat the process

  4. Description Easiness • Describing syntax is easier than describing semantics • partly because a concise and universally accepted notation is available for syntax description • but none has yet been developed for semantics

  5. Definitions of Languages and Sentences • A language, whether natural (such as English) or artificial (such as Java), is a set of strings of characters from some alphabet. • The strings of a language are called sentences or statements.

  6. Syntax Rules of a Language • The syntax rules of a language specify which strings of characters from the language’s alphabet are in the language.

  7. Definitions of Lexemes • The lexeme of a programming language include its • numeric literals • operators • and special words, among others. • (e.g., *, sum, begin) • Alexemeis the lowest level syntactic unit of a language: • for simplicity’s sake, formal descriptions of the syntax of programming languages often do not include description of them • However the description of lexemes can be given by a lexical specification which is usually separate from the syntactic description of the language.

  8. An Informal Definition of a Program • A program could be thought as strings of lexemes rather than of characters.

  9. Lexeme Group • Lexemes are partitioned into groups. • For example, in a programming language, the names of variables, methods, classes, and so forth form a group called identifier. • Each of these groups is represented by a name, or token.

  10. Definitions of Tokens • A tokenis a category of lexemes (e.g., identifier) • a token may have only a single lexeme • For example, the token for the arithmetic operator symbol +, which may have the name plus_op, has just one possible lexeme. • a lexeme of a token can also be called an instance of the token

  11. Example of Tokens • Identifier is a token that can have lexemes, or instances, such as sum and total.

  12. Lexemes and Tokens [csusb], [Lee] • Lexemes: a string of characters in a language that is treated as a single unit in the syntax and semantics. • For example identifiers, numbers, and operators are often lexemes • In a programming language, there are a very large number of lexemes, perhaps even an infinite number; however, there are only a small number of tokens.

  13. Examples of Lexemes and Tokens [Lee] • while (y  >=  t) y  =  y - 3 ; will be represented by the set of pairs: a token with only one lexeme

  14. Ways to Define a Language • In general, languages can be formally defined in two distinct ways: by recognition and by generation. • P.S.: Although neither provides a definition that is practical by itself for people trying to lean or even use a programming language.

  15. Language Recognizer • a recognition device that • reads strings of characters from the alphabet of a language and • decides whether an input string was or was not in the language • Example: syntax analysis part of a compiler is a recognizer for the language the compiler translates • are not used to enumerate all of the sentences of a language

  16. Language Generators • a device that generates the sentences of a language. • We can think of the generator as having a button that produces a sentence of the language every time it is pushed • However, the particular sentence that is produced by a generator when its button is pushed is unpredictable • One can determine if the syntax of a particular sentence is correct by comparing it to the structure of the generator • People learn a language from examples of its sentences

  17. Grammars [wiki] • In computer science and linguistics, a formal grammar, or sometimes simply grammar, is a precise description of a formal language — that is, of a set of strings. • Commonly used to describe the syntax of programming languages

  18. Formal Grammars [wiki] • A formal grammar, or sometimes simply grammar, consists of: • a finite set of terminal symbols; • a finite set of nonterminal symbols; • a finite set of production rules with a left- and a right-hand side consisting of a sequence of these symbols • a start symbol.

  19. Grammar Example [wiki] • Nonterminals are usually represented by uppercase letters • terminals by lowercase letters • the start symbol by S. • For example, the grammar with • terminals {a,b}, • nonterminals {S,A,B}, • production rules • SABS • S ε (where ε is the empty string) • BAAB • BS b • Bb bb • Abab • Aa aa • start symbol S, • defines the language of all words of the form anbn (i.e. n copies of a followed by n copies of b).

  20. Formal Grammars and Formal Languages [wiki] • A formal grammar defines (or generates) a formal language, which is a (possibly infinite) set of sequences of symbols that may be constructed by applying production rules to a sequence of symbols which initially contains just the start symbol.

  21. Grammar Classes [wiki] • In the mid-1950s, according to the format of the production rules, Chomsky described four classes of generative devices or grammars that define four classes of languages: • recursively enumerable • context-sensitive • context-free • regular

  22. Application of Context-free and Regular Grammars • The tokens of programming languages can be described by regular grammars. • Whole programming languages, with minor exceptions, can be described by context-free grammars.

  23. Review of Context-Free Grammars • Context-Free Grammars • Developed by Noam Chomsky in the mid-1950s • Language generators, meant to describe the syntax of natural languages • Define a class of languages called context-free languages

  24. Terminology - metalanguage • A metalanguage is a language that is used to describe another language.

  25. Backus-Naur Form (BNF) • Backus-Naur Form (1959) • Invented by John Backus to describe Algol 58 • Most widely known method for describing programming language syntax • BNF is equivalent to context-free grammars • BNF is a metalanguage used to describe another language

  26. Extended BNF • Extended BNF • Improves readability and writability of BNF

  27. BNF Abstraction • BNF uses abstractions for syntactic structures • For example, a simple Javaassignment statement might be represented by the abstraction <assign>. • (pointed brackets are often used to delimit names of abstractions) • In fact, tokens are also abstractions.

  28. Synonym Nonterminals symbols (nonterminals) BNF abstractions Terminals Lexemes and tokens

  29. BNF Rules • A rule, also called a production, has a left-hand side (LHS) and a right-hand side (RHS), and consists of terminal and nonterminal symbols. • An abstraction is defined by a rule. • Examples of BNF rules: <ident_list> → identifier | identifier,<ident_list> <if_stmt> → if <logic_expr> then <stmt> • A grammar is a finite nonempty set of rules.

  30. Multiple Definitions of a Nonterminal • Nonterminal symbols can have two or more distinct definitions, representing two or more possible syntactic forms in the language. • Multiple definitions can be written as a single rule, with different definitions separated by the symbol |.

  31. Example of Multiple Definitions of a Nonterminal • For example, an Adaif statement can be described with the rules: <if_stmt> if <logic_expr> then <stmt> <if_stmt> if <logic_expr> then <stmt> else <stmt> Or with the rule <if_stmt> if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt>

  32. A Rule Example <assign>  <var> = <expression> • The above rule specifies that the abstraction <assign> is defined as an instance of the of the abstraction <var>, followed by the lexeme =, followed by an instance of the abstraction <expression>. • One example whose syntactic structure is described by the rule is: total = subtotal1 + subtotal2

  33. Expressive Capability of BNF • Although BNF is simple, it is sufficiently powerful to describe nearly all of the syntax of programming languages. • BNF can describe • Lists of similar construct • The order in which different constructs must appear • Nested structures to any depth • Imply operator precedence • Imply operator associativity

  34. Variable-length Lists and Recursive Rules • Variable-length lists in mathematics are often written using an ellipsis (…). • 1,2, … is an example. • BNF does not include the ellipsis, it uses recursive rules as an alternative. • A rule is recursive if its LHS appears in its RHS.

  35. Example of a Recursive Rule • Syntactic lists are described using recursion <ident_list>  ident | ident, <ident_list> the above rule defines <ident_list> as either a single token (identifier) or an identifier followed by a comma followed by another instance of <ident_list>.

  36. Grammar and Derivation • A grammar is a generative device for defining languages. • The sentences of a languages are created through deviations. • A derivation is a repeated application of rules, starting with a special nonterminal of the grammar called the start symbol and ending with a sentence (all terminal symbols)

  37. An Example of Start Symbols • In a grammar for a complete programming language, the start symbol represents a complete program and is usually named <program>.

  38. Example 3.1 - An Example Grammar • What follows is a grammar for a small language. P.S.:<program> is the start symbol.

  39. Language Defined by the Grammar in Example 3.1 • The language in Example 3.1 has only one statement form: assignment. • A program consists of the special word begin, followed by a list of statements separated by semicolons, followed by the special word end. • An expression is either a single variable, or two variables separated by either a + or – operator. • The only variable names in this language are A, B, and C.

  40. An Example Derivation <program>=> begin <stmt_list> end => begin <stmt> ; <stmt_list> end => begin <var> = <expression>; <stmt_list> end => begin A = <expression> ; <stmt_list> end => begin A = <var> + < var > ; <stmt_list> end => begin A = B + <var> ; <stmt_list> end => begin A = B + C ; <stmtjist> end => begin A = B + C ; <stmt> end => begin A = B + C ; <var> := <expression> end => begin A=B + C;B=<expression> end => begin A = B + C ;B= <var> end => begin A = B + C ; B = C end derive

  41. Sentential Form • In the previous slide, each successive string in the sequence is derived from the previous string by replacing one of the nonterminals with one of that nonterminal’s definitions. • Every string of symbols in the derivation is a sentential form.

  42. Order of Derivation • A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded. • The derivation continues until the sentential form contains no nonterminals. • The sentential form, consisting of only terminals, or lexemes, is the generated sentence. • Rightmost derivation • A derivation may be neither leftmost nor rightmost. • However the derivation order has no effect on the language generated by a grammar.

  43. Using the Grammar in Example 3.1 to Generate Different Sentences • By choosing alternative RHSs of rules with which to replace nonterminals in the derivation, different sentences in the language can be generated. • By exhaustively choosing all combinations of choices, the entire language can be generated. • This language, like most others, is infinite, so one cannot generate all the sentences in the language in finite time.

  44. Example 3.2 • This grammar describes assignment statements whose right sides are arithmetic expressions with multiplication and addition operators and parentheses.

  45. The Leftmost Derivation for A=B*(A+C) <assign> => <id> = <expr> => A = <expr> => A = <id> * <expr> => A = B * <expr> => A = B * (< expr >) => A = B * (<id> + <expr>) => A = B * (A + <expr>) => A = B * (A + <id>) => A = B * (A + C)

  46. Parse Trees • One of the most attractive features of grammars is that they naturally describe the hierarchical syntactic structure of the sentences of the languages they define. • These hierarchical structures are called parse trees.

  47. Figure 3.1 - a Parse Tree for A=B*(A+C) • The right side figure is a parse tree. • This parse tree shows the structure of the assignment statement derived previously.

  48. A Derivation and Its Corresponding Parse Tree <assign> => <id> = <expr> => A = <expr> => A=<id>*<expr> => A = B * <expr> => A = B * (< expr >) => A = B* ( <id> + <expr> ) => A = B* (A+<expr> ) => A = B*<A+<id> ) => A=B*(A+C)

  49. Mapping between a Grammar and Its Corresponding Parse Tree • Every internal node of a parse tree is labeled with a nonterminal symbol. • Every leaf is labeled with a terminal symbol. • Every subtree of a parse tree describes one instance of an abstraction in the statement.

  50. Ambiguity in Grammars • A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees.

More Related