340 likes | 516 Views
Programming Languages (formal languages). Syntax:. -- How to describe them? -- How to use them? (machine and human). Describe the structures of programs. Grammars --. Ambiguous (sometimes). solution: using unambiguous only. Semantics:. Describe the meaning of programs.
E N D
Programming Languages (formal languages) Syntax: -- How to describe them?-- How to use them? (machine and human) Describe the structures of programs Grammars -- Ambiguous (sometimes) solution: using unambiguous only Semantics: Describe the meaning of programs Textbook, manuals -- Confusing (always) solution: denotation semantics (for nuts only) IT 327
English Grammar The man saw the girl with a telescope. subject verb object The man hit the ball. subject verb object The purpose of grammar: To tell whether a sentence is valid. (old fashion) Chomsky: To have a device to generate all valid sentences in the target language (from a root). IT 327
Noam Chomsky1928 - Syntactic Structures (1957) Generative Grammar A valid sentence is generated from a root according to some fixed rules (grammar). http://www.canada.com/nationalpost/news/issuesideas/story.html?id=1385b76d-6c34-4c22-942a-18b71f2c4a44 IT 327
A generative grammar in Syntactic Structures root S NP + VP NP T + N non-terminal symbols VP Verb + NP T the | a terminal symbols N man | ball | car ….. Verb hit | take | took | run | ran ….. IT 327
Syntactic Structures the man hit the ball S NP VP T N Verb NP the man hit T N the ball IT 327
Backus-Naur Form, BNF Grammar 1 Grammar 2 <S> ::= <NP> <VP> <NP> ::= <T> <N> <VP> ::= <V> <NP> <T> ::= the<N> ::= man | ball <V> ::= hit | took <S> ::= <NP> <V> <NP> <NP> ::= <A> <N> <V> ::= loves | hates|eats <A> ::= a | the<N> ::= dog | cat| rat <S> ::= <NP> <V> <NP> | <NP> <VP> <NP> ::= <T> <N> | <A> <N> <V> ::= loves | hates|eats |hit | took <A> ::= a | the <T> ::= the <N> ::= dog | cat| rat|man | ball IT 327
Deviation: the sequence of processes that generate a sentence <S> <NP> <VP> <T> <N> <VP> the <N> <VP> the man <VP> the man <V> <NP> the manhit <NP> the manhit <T> <N> the manhitthe <N> the manhitthe ball <S> ::= <NP> <VP> <NP> ::= <T> <N> <VP> ::= <V> <NP> <T> ::= the<N> ::= man | ball <V> := hit | took Grammar 1 the manhitthe ball IT 327
(American Heritage Dict.) Parse: v. To break (a sentence) down into its component parts of speech with an explanation of the form, function, and syntactical relationship of each part. the dog loves the cat × the loves dog the cat × loves the dog the cat IT 327
<S> ::= <NP> <V> <NP> <NP> ::= <A> <N> <V> ::= loves | hates|eats <A> ::= a | the<N> ::= dog | cat| rat A Parse Tree <S> <NP> <V> <NP> Grammar <A> <N> <A> <N> loves dog cat the the “the loves dog the cat” doesn’t have a parse tree IT 327
A grammar for Arithmetic Expression Example: ((a+b)*c) Is this expression valid? <exp> ::= <exp> + <exp> | <exp> * <exp> | ( <exp> ) | a | b | c <exp> ( <exp> ) ( <exp> * <exp> ) (( <exp> ) * <exp> ) ((<exp> + <exp> ) * <exp> ) ((a+ <exp> ) * <exp> ) ((a+b) * <exp> ) ((a+b)*c) Yes IT 327
<exp> A Parse Tree for ((a+b)*c) ( <exp> ) <exp> * <exp> ( <exp> ) c <exp> + <exp> b a IT 327
Parse Trees for a+b*c ? <exp> <exp> <exp> + <exp> <exp> * <exp> c a <exp> * <exp> <exp> + <exp> b c a b What is the meaning of a+b*c IT 327
Diagram in terms of the sizes of the set of restrictions Restrictions on Grammars Unrestricted Grammars (type-0) Context Sensitive (type-1) Context Free (type-2) Right/Left Linear Grammars (type-3) Why context sensitive grammars have less restrictions than context free grammars? IT 327
Diagram in terms of the sizes of the language families Chomsky Hierarchy Regular Expressions (type-3) Context-free languages (type-2) Context-sensitive languages (type-1) Computable (formal) languages (type-0) IT 327
Grammars in BNF (Backus-Naur Form) • A BNF grammar consists of four parts: • The finite set of tokens (terminal symbols) • The finite set of non-terminal symbols • The start symbol • The finite set of production rules <S> ::= <NP> <VP> <NP> ::= <T> <N> <VP> ::= <V> <NP> <T> ::= the<N> ::= man | ball <V> ::=hit | took IT 327
Constructing Grammars <var-dec> float a; boolean a, b, c; int a, b; • Using divide and conquer to simplify the job. • Data types, variable names (identifiers) • One variable, one type (this is not grammar’s job to make sure) IT 327
Primitive type names <var-dec> ::= <type-name> <declarator-list> ; Using divide and conquer <type-name> ::= boolean | byte | short | int | long | char | float | double <declarator-list> ::= <declarator> | <declarator> , <declarator-list> <declarator> ::= <variable-name> | <variable-name> = <expr> IT 327
Programs stored in files are just sequences of characters, but we want to prepare them into tokens before further analysis. Tokens: Tokens are atoms of the program • How is such a program file (a sequence of characters) divided into a sequence of tokens? • e.g. • identifiers (const, x, fact) • keywords (if, const) • operators (==) • constants (123.4), etc. Reserved words IT 327
Lexical Structure And Phrase Structure • Grammars so far have defined phrase structure: how a program is built from a sequence of tokens • We also need to define lexical structure: how a text file is divided into tokens IT 327
Separate Grammars • Usually there are two separate grammars • to construct a sequence of tokens from a file of characters (Lexical Structure) • to construct a parse tree from a sequence of tokens (Phrase Structure) <program-file> ::= <end-of-file> | <element> <program-file> <element> ::= <token> | <one-white-space> | <comment><one-white-space> ::= <space> | <tab> | <end-of-line><token> ::= <identifier> | <operator> | <constant> | … IT 327
Separate Compiler Passes • Scanner tokens string • parser parse tree • (more to do afterwards) IT 327
Historical Note #1 • Early languages sometimes did not separate lexical structure from phrase structure • Early Fortran and Algol dialects allowed spaces anywhere, even in the middle of a keyword • Other languages like PL/I or Early Fortran allow keywords to be used as identifiers This makes them difficult to scan and parse It also reduces readability IT 327
Historical Note #2 • Some languages have a fixed-format lexical structure -- column positions are significant • One statement per line (i.e. per card) • First few columns for statement label • Etc. • Early dialects of Fortran, Cobol, and Basic • Almost all modern languages are free-format: column positions are ignored IT 327
Other Grammar Forms • BNF variations • EBNF variations • Syntax diagrams IT 327
BNF Variations • Some use or = instead of ::= • Some leave out the angle brackets and use a distinct typeface for tokens • Some allow single quotes around tokens, for example to distinguish ‘|’ as a token from | as a meta-symbol Interesting operator!! Or not! Sir, please Step away from the ASR-33 IT 327
EBNF Variations • Additional syntax to simplify some grammar chores: • {x} to mean zero or more repetitions of x • [x] to mean x is optional (i.e. x | <empty>) • () for grouping • | anywhere to mean a choice among alternatives • Quotes around tokens, if necessary, to distinguish from meta-symbols IT 327
EBNF Examples • Anything that extends BNF this way is called an Extended BNF: EBNF • There are many variations <if-stmt> ::= if <expr> then <stmt> [else <stmt>] <stmt-list> ::= {<stmt> ;} <thing-list> ::= { (<stmt> | <declaration>) ;} IT 327
Syntax Diagrams • Syntax diagrams (“railroad diagrams”) <if-stmt> ::= if <expr> then <stmt> else <stmt> if-stmt if expr then stmt else stmt IT 327
Bypasses <if-stmt> ::= if <expr> then <stmt> [else <stmt>] if-stmt if expr then stmt else stmt IT 327
Branching <exp> ::= <exp> + <exp> | <exp> * <exp> | ( <exp> ) | a | b | c IT 327
Loops <exp> ::= <addend> {+ <addend>} IT 327
Syntax Diagrams, Pro and Con • Easier for human to read (follow) • Difficult to perceive the phrase structures (syntax tree)? • Harder for machine to read (for automatic parser-generators) IT 327
Conclusion • We use grammars to define programming language syntax, both lexical structure and phrase structure • Connection between theory and practice • Two grammars, two compiler passes • Parser-generators can produce code for those two passes automatically from grammars (compiler tools) IT 327