380 likes | 547 Views
10. High Level Languages. Java (Object Oriented) ASP RDF (Horn Clause Deduction, Semantic Web). This Course. Jython in Java. Relation. Programming Languages. Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis – Tokenizing Syntactic Analysis – Parsing
E N D
10 High Level Languages Java (Object Oriented) ASP RDF (Horn Clause Deduction, Semantic Web) This Course Jython in Java Relation
Programming Languages • Lexical and Syntactic Analysis • Chomsky Grammar Hierarchy • Lexical Analysis – Tokenizing • Syntactic Analysis – Parsing • Hmm Concrete Syntax • Hmm Abstract Syntax Noam Chomsky
Chomsky Hierarchy • Regular grammar – used for tokenizing • Context-free grammar (BNF) – used for parsing • Context-sensitive grammar – not really used for programming languages
Regular Grammar • Simplest; least powerful • Equivalent to: • Regular expression (think of perl) • Finite-state automaton • Right regular grammar: • Terminal*, A and B Nonterminal A → B A → • Example: Integer→ 0 Integer | 1 Integer | ... | 9 Integer | 0 | 1 | ... | 9
Regular Grammar • Less powerful than context-free grammars • The following is not a regular language { aⁿ bⁿ | n ≥ 1 } i.e., cannot balance: ( ), { }, begin end
Regular Expressions x a character x \x an escaped character, e.g., \n { name } a reference to a name M | N M or N M N M followed by N M* zero or more occurrences of M M+ One or more occurrences of M M? Zero or one occurrence of M [aeiou] the set of vowels [0-9] the set of digits . any single character
Finite State Automaton for Identifiers (S, a2i$) ├ (I, 2i$) ├ (I, i$) ├ (I, $) ├ (F, ) Thus: (S, a2i$) ├* (F, )
Context-Free Grammar Production: α → β α Nonterminal β (Nonterminal Terminal)* ie, lefthand side is a single nonterminal, and righthand side is a string of nonterminals and/or terminals (possibly empty).
Context-Sensitive Grammar Production: α → β |α| ≤ |β| α, β (Nonterminal Terminal)* ie, lefthand side can be composed of strings of terminals and nonterminals, however, the number of items on the left must be smaller than the number of items on the right.
Syntax • The syntax of a programming language is a precise description of all its grammatically correct programs. • Precise syntax was first used with Algol 60, and has been used ever since. • Three levels: • Lexical syntax - all the basic symbols of the language (names, values, operators, etc.) • Concrete syntax - rules for writing expressions, statements and programs. • Abstract syntax - internal representation of the program, favoring content over form.
Grammars • Grammars: Metalanguages used to define the concrete syntax of a language. • Backus Normal Form – Backus Naur Form (BNF) • Stylized version of a context-free grammar (cf. Chomsky hierarchy) • First used to define syntax of Algol 60 • Now used to define syntax of most major languages • Production: • α → β • α Nonterminal • β (Nonterminal Terminal)* • ie, lefthand side is a single nonterminal, and β is a string of nonterminals and/or terminals (possibly empty). • Example • IntegerDigit | Integer Digit • Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Extended BNF (EBNF) • Additional metacharacters • { } a series of zero or more • ( ) must pick one from a list • [ ] pick none or one from a list • Example • Expression -> Term { ( + | - ) Term } • IfStatement -> if ( Expression ) Statement [ else Statement ] • EBNF is no more powerful than BNF, but its production rules are often simpler and clearer. • Javacc EBNF • ( … )* a series of zero or more • ( … )+ a series of one or more • [ … ] optional
For more details, see Chapter 2 of “Programming Language Pragmatics, Third Edition (Paperback)” Michael L. Scott (Author)
Instance of a Programming Language: int main () { return 0 ; } Internal Parse Tree Program (abstract syntax): Function = main; Return type = int params = Block: Return: Variable: return#main, LOCAL addr=0 IntValue: 0 Abstract Syntax
Parse Trees • IntegerDigit | Integer Digit • Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Parse Tree for 352 as an Integer
Arithmetic Expression Grammar • Expr Expr + Term | Expr – Term | Term • Term 0 | ... | 9 | ( Expr ) Parse of 5 - 4 + 3
Associativity and Precedence • A grammar can be used to define associativity and precedence among the operators in an expression. E.g., + and - are left-associative operators in mathematics; * and / have higher precedence than + and - . • Consider the following grammar: Expr -> Expr + Term | Expr – Term | Term Term -> Term * Factor | Term / Factor | Term % Factor | Factor Factor -> Primary ** Factor | Primary Primary -> 0 | ... | 9 | ( Expr )
Associativity and Precedence Parse of 4**2**3 + 5 * 6 + 7
Precedence Associativity Operators 3 right ** 2 left * / % 1 left + - Note: These relationships are shown by the structure of the parse tree: highest precedence at the bottom, and left-associativity on the left at each level. Associativity and Precedence
Ambiguous Grammars • A grammar is ambiguous if one of its strings has two or more diffferent parse trees. • Example: Expr -> Expr Op Expr | ( Expr ) | Integer Op -> + | - | * | / | % | ** • Equivalent to previous grammar but ambiguous
Ambiguous Grammars Ambiguous Parse of 5 – 4 + 3
Dangling Else Ambiguous Grammars IfStatement -> if ( Expression ) Statement | if ( Expression ) Statement else Statement Statement -> Assignment | IfStatement | Block Block -> { Statements } Statements -> Statements Statement | Statement With which ‘if’ does the following ‘else’ associate if (x < 0) if (y < 0) y = y - 1; else y = 0;
Program : {[ Declaration ]|retType Identifier Function | MyClass | MyObject} Function : ( ) Block MyClass: Class Idenitifier { {retType Identifier Function}Constructor {retType Identifier Function } } MyObject: Identifier Identifier = create Identifier callArgs Constructor: Identifier ([{ Parameter } ]) block Declaration : Type Identifier [ [Literal] ]{ , Identifier [ [ Literal ] ] } Type : int|bool| float | list |tuple| object | string | void Statements : { Statement } Statement : ; | Declaration| Block |ForEach| Assignment |IfStatement|WhileStatement|CallStatement|ReturnStatement Block : { Statements } ForEach: for( Expression <- Expression ) Block Assignment : Identifier [ [ Expression ] ]= Expression ; Parameter : Type Identifier IfStatement: if ( Expression ) Block [elseifStatement| Block ] WhileStatement: while ( Expression ) Block Hmm BNF (i.e., Concrete Syntax)
Hmm BNF (i.e., Concrete Syntax) Expression : Conjunction {|| Conjunction } Conjunction : Equality {&&Equality } Equality : Relation [EquOp Relation ] EquOp: == | != Relation : Addition [RelOp Addition ] RelOp: <|<= |>|>= Addition : Term {AddOp Term } AddOp: + | - Term : Factor {MulOp Factor } MulOp: * | / | % Factor : [UnaryOp]Primary UnaryOp: - | ! Primary : callOrLambda|IdentifierOrArrayRef| Literal |subExpressionOrTuple|ListOrListComprehension| ObjFunction callOrLambda : Identifier callArgs|LambdaDef callArgs : ([Expression |passFunc { ,Expression |passFunc}] ) passFunc : Identifier (Type Identifier { Type Identifier } ) LambdaDef : (\\ Identifier { ,Identifier } -> Expression)
Hmm BNF (i.e., Concrete Syntax) IdentifierOrArrayRef : Identifier [ [Expression] ] subExpressionOrTuple : ([ Expression [,[ Expression { , Expression } ] ] ] ) ListOrListComprehension: [ Expression {, Expression } ] | | Expression[<- Expression ] {, Expression[<- Expression ] } ] ObjFunction: Identifier . Identifier . Identifier callArgs Identifier : (a |b|…|z| A | B |…| Z){ (a |b|…|z| A | B |…| Z )|(0 | 1 |…| 9)} Literal : Integer | True | False | ClFloat | ClString Integer : Digit { Digit } ClFloat: 0 | 1 |…| 9 {0 | 1 |…| 9}.{0 | 1 |…| 9} ClString: ”{~[“] }”
Associativity and Precedencefor Hmm Clite Operator Associativity Unary - ! none * / left + - left < <= > >= none == != none && left || left
Hmm Parse Tree Example z = x + 2 * y;
= Hmm Parse Tree z = x + 2 * y;
Assignment = Variable target; Expression source Expression = VariableRef | Value | Binary | Unary VariableRef = Variable | ArrayRef Variable = String id ArrayRef = String id; Expression index Value = IntValue | BoolValue | FloatValue | CharValue Binary = Operator op; Expression term1, term2 Unary = UnaryOp op; Expression term Operator = ArithmeticOp | RelationalOp | BooleanOp IntValue = Integer intValue … Very ApproximateHmm Abstract Syntax
= Operator Variable Binary + x Operator Value Variable y 2 * Hmm Abstract Syntax – Binary Example z = x + 2 * y Binary