1 / 34

Syntax:

Programming Languages (formal languages). Syntax:. -- How to describe them? -- How to use them? (machine and human). Describe the structures of programs. Grammars --. Ambiguous (sometimes). solution: using unambiguous only. Semantics:. Describe the meaning of programs.

Download Presentation

Syntax:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming Languages (formal languages) Syntax: -- How to describe them?-- How to use them? (machine and human) Describe the structures of programs Grammars -- Ambiguous (sometimes) solution: using unambiguous only Semantics: Describe the meaning of programs Textbook, manuals -- Confusing (always) solution: denotation semantics (for nuts only) IT 327

  2. English Grammar The man saw the girl with a telescope. subject verb object The man hit the ball. subject verb object The purpose of grammar: To tell whether a sentence is valid. (old fashion) Chomsky: To have a device that can generate all valid sentences of the target language (from a root). IT 327

  3. Noam Chomsky (1928 - ) Syntactic Structures (1957) Generative Grammar A valid sentence is generated from the root according to some fixed rules (grammar). IT 327

  4. Directly copied from Chomsky’s book IT 327

  5. Syntactic Structures the man hit the ball S NP VP T N Verb NP the man hit T N the ball IT 327

  6. A generative grammar in Syntactic Structures root S NP + VP NP T + N non-terminal symbols VP Verb + NP T the | a terminal symbols N man | ball | car ….. Verb hit | take | took | run | ran ….. IT 327

  7. Backus-Naur Form, BNF Grammar 1 Grammar 2 <S> ::= <NP> <VP> <NP> ::= <T> <N> <VP> ::= <V> <NP> <T> ::= the<N> ::= man | car| ball <V> ::= hit | took <S> ::= <NP> <V> <NP> <NP> ::= <A> <N> <V> ::= loves | hates|eats <A> ::= a | the<N> ::= dog | cat| rat <S> ::= <NP> <V> <NP> | <NP> <VP> <NP> ::= <T> <N> | <A> <N> <VP> ::= <V> <NP> <V> ::= loves | hates|eats |hit | took <A> ::= a | the <T> ::= the <N> ::= dog | cat| rat | man | car | ball IT 327

  8. Deviation: the sequence of processes that generate a sentence <S> <NP> <VP> <T> <N> <VP> the <N> <VP> the man <VP> the man <V> <NP> the manhit <NP> the manhit <T> <N> the manhitthe <N> the manhitthe ball <S> ::= <NP> <VP> <NP> ::= <T> <N> <VP> ::= <V> <NP> <T> ::= the<N> ::= man | car | ball <V> := hit | took Grammar 1 the manhitthe ball IT 327

  9. (American Heritage Dict.) Parse: v. To break (a sentence) down into its component parts of speech with an explanation of the form, function, and syntactical relationship of each part. the dog loves the cat × the loves dog the cat × loves the dog the cat IT 327

  10. <S> ::= <NP> <V> <NP> <NP> ::= <A> <N> <V> ::= loves | hates|eats <A> ::= a | the<N> ::= dog | cat| rat A Parse Tree <S> <NP> <V> <NP> Grammar <A> <N> <A> <N> loves dog cat the the “the loves dog the cat” doesn’t have a parse tree IT 327

  11. Diagram in terms of the sizes of the set of restrictions Restrictions on Grammars Unrestricted Grammars (type-0) Context Sensitive (type-1) Context Free (type-2) Right/Left Linear Grammars (type-3) Why context sensitive grammars have less restrictions than context free grammars? IT 327

  12. Diagram in terms of the sizes of the language families Chomsky Hierarchy Regular Expressions (type-3) Context-free languages (type-2) Context-sensitive languages (type-1) Computable (formal) languages (type-0) IT 327

  13. A grammar for Arithmetic Expression Example: ((a+b)*c) Is this expression valid? <exp> ::= <exp> + <exp> | <exp> * <exp> | ( <exp> ) | a | b | c <exp> ( <exp> ) ( <exp> * <exp> ) (( <exp> ) * <exp> ) ((<exp> + <exp> ) * <exp> ) ((a+ <exp> ) * <exp> ) ((a+b) * <exp> ) ((a+b)*c) Yes IT 327

  14. <exp> A Parse Tree for ((a+b)*c) ( <exp> ) <exp> * <exp> ( <exp> ) c <exp> + <exp> b a IT 327

  15. Parse Trees for a+b*c ? <exp> <exp> <exp> + <exp> <exp> * <exp> c a <exp> * <exp> <exp> + <exp> b c a b What is the meaning of a+b*c IT 327

  16. Grammars in BNF (Backus-Naur Form) • A BNF grammar consists of four parts: • The finite set of tokens (terminal symbols) • The finite set of non-terminal symbols • The start symbol • The finite set of production rules <S> ::= <NP> <VP> <NP> ::= <T> <N> <VP> ::= <V> <NP> <T> ::= the<N> ::= man | ball <V> ::=hit | took IT 327

  17. Constructing Grammars <var-dec> float a; boolean a, b, c; int a, b; • Using divide and conquer to simplify the job. • Data types, variable names (identifiers) • Statements, programs • One variable, one type, but this is not grammar’s job to make sure) IT 327

  18. Primitive type names <var-dec> ::= <type-name> <declarator-list> ; Using divide and conquer <type-name> ::= boolean | byte | short | int | long | char | float | double <declarator-list> ::= <declarator> | <declarator> , <declarator-list> <declarator> ::= <variable-name> | <variable-name> = <expr> IT 327

  19. Programs stored in files are just sequences of characters, but we want to prepare them into tokens before further analysis. Tokens: Tokens are atoms of the program • How to divide a program (a sequence of characters in a file) into a sequence of tokens? e.g. • identifiers (const, x, fact) • keywords (if, const) • operators (==) • constants (123.4), etc. Reserved words IT 327

  20. Lexical Structure & Phrase Structure • Grammars so far have defined phrase structure: how a program is built from a sequence of tokens • We also need to use grammars to define lexical structure: how a text file is divided into tokens IT 327

  21. Separate Grammars • Usually there are two separate grammars • to construct a sequence of tokens from a file of characters (Lexical Structure) • to construct a parse tree from a sequence of tokens (Phrase Structure) <program-file> ::= <end-of-file> | <element> <program-file> <element> ::= <token> | <one-white-space> | <comment><one-white-space> ::= <space> | <tab> | <end-of-line><token> ::= <identifier> | <operator> | <constant> | … IT 327

  22. Separate Compiler Passes • Scanner  tokens string • parser parse tree • (more to do afterwards) IT 327

  23. Historical Note #1 • Early languages sometimes did not separate lexical structure from phrase structure • Early Fortran and Algol dialects allowed spaces anywhere, even in the middle of a keyword • Other languages like PL/I or early Fortran allow keywords to be used as identifiers This makes them difficult to scan and parse It also reduces readability IT 327

  24. Historical Note #2 • Some languages have a fixed-format lexical structure -- column positions are significant. Examples: • One statement per line (i.e. per card) • First few columns for statement label • Etc. • Early dialects of Fortran, Cobol, and Basic • Almost all modern languages are free-format: column positions are ignored (exception: Python) IT 327

  25. Other Grammar Forms • BNF variations • EBNF variations • Syntax diagrams IT 327

  26. BNF Variations • Some use  or = instead of ::= • Some leave out the angle brackets and use a distinct typeface for tokens • Some allow single quotes around tokens, for example to distinguish ‘|’ as a token from | as a meta-symbol Interesting operator!! Or not! Sir, please Step away from the ASR-33 IT 327

  27. EBNF Variations • Additional syntax to simplify some grammar chores: • {x} to mean zero or more repetitions of x • [x] to mean x is optional (i.e. x | <empty>) • () for grouping • | anywhere to mean a choice among alternatives • Quotes around tokens, if necessary, to distinguish from meta-symbols IT 327

  28. EBNF Examples • Anything that extends BNF this way is called an Extended BNF: EBNF • There are many variations <if-stmt> ::= if <expr> then <stmt> [else <stmt>] <stmt-list> ::= {<stmt> ;} <thing-list> ::= { (<stmt> | <declaration>) ;} IT 327

  29. Syntax Diagrams • Syntax diagrams (“railroad diagrams”) <if-stmt> ::= if <expr> then <stmt> else <stmt> if-stmt if expr then stmt else stmt IT 327

  30. Bypasses <if-stmt> ::= if <expr> then <stmt> [else <stmt>] if-stmt if expr then stmt else stmt IT 327

  31. Branching <exp> ::= <exp> + <exp> | <exp> * <exp> | ( <exp> ) | a | b | c IT 327

  32. Loops <exp> ::= <addend> {+ <addend>} IT 327

  33. Syntax Diagrams, Pro and Con • Easier for human to read (follow) • Difficult to perceive the phrase structures (syntax tree)? • Harder for machine to read (for automatic parser-generators) IT 327

  34. Conclusion • We use grammars to define programming language syntax, both lexical structure and phrase structure • Connection between theory and practice • Two grammars, two compiler passes • Parser-generators can produce code for those two passes automatically from grammars (compiler tools) IT 327

More Related