210 likes | 217 Views
Programming Languages G22.2110-001 Walter Williams. Administrative Stuff. Homework, Exams, etc. Weekly assignments Programming projects Mid-Term & Final Exams No cheating Join the mailing list: http://www.cs.nyu.edu/mailman/listinfo/g22_2110_001_su04 Recitation.
E N D
Programming LanguagesG22.2110-001Walter Williams G22.2110-001
Administrative Stuff • Homework, Exams, etc. • Weekly assignments • Programming projects • Mid-Term & Final Exams • No cheating • Join the mailing list: http://www.cs.nyu.edu/mailman/listinfo/g22_2110_001_su04 • Recitation G22.2110-001
What’s covered in Lectures & Texts • Purpose of course is for you to understand: • The issues involved in programming language design • The various strategies for programming and how languages support those strategies • Type systems, OO support, abstraction, concurrent & generic programming • Not just learning to program in different languages • TextBooks: • Scott – covers both compilers and programming languages. You can skip the compiler stuff. • Barnes – Ada language, used by Defense Dept and other critical applications. • Paulson – ML language, widely used in AI, Language theory, etc. • Others: • Stanley Lippman, The C++ Object Model; • Bjarne Stroustrup, The Design and Evolution of C++ • The Little Schemer • Java Language Specification G22.2110-001
Language & Communication • Human (Natural) Language • Problem Domain Language • Algorithmic Language • Documentation Language • Programming Language • Language as a tool for thought G22.2110-001
Programming Language Stakeholders • Software Developers • Specification & Design • Coding • Compiler Writers • Maintenance Programmers • Quality Control & Support • Management G22.2110-001
Language Attributes • Expressiveness • APL for arrays, Lisp for lists, etc. • All major languages are Turing complete • Efficiency • Of coding, compilation or execution • Readability • By programming experts, domain experts and non-experts • Scalability • Communicating parallel programmers • Modules, separate compilation and information hiding • Safety and Security • Market Attributes • Popularity => availability of programmers, tools, libraries, etc. G22.2110-001
Models (Styles) of Computation • Imperative (Procedural) • Mutable storage – modified by assignment • Fortran, Algol, C++, Java • Functional (Applicative) • Pure mathematical functions – no side effects • ML, Haskell, Smalltalk • Declarative • Programs are sets of (logical) assertions • Prolog, SQL • Object Oriented • Orthogonal to the three models above • Inheritance, Polymorphism, Encapsulation G22.2110-001
Compilers & Interpreters • Compiling vs. Interpreting • Compilers translate at compile time, once • Interpreters translate at runtime, every time • Front End • Syntactic Analysis: Lexical Analysis & Parsing • Semantic Analysis & Error Checking • Generates Intermediate Code • Back End • Most optimizations • Turns Intermediate Code into Executable G22.2110-001
Programming Environments • Development Environment • Interactive Development Environments • Smalltalk browser environment • Microsoft IDE • Development Frameworks • Swing, MFC • Language aware Editors • Libraries • Java Swing classes • C++ Standard Template Library (STL) • Libraries change much more quickly than the language • Libraries usually very different for different languages G22.2110-001
Lexical Issues • Lexical Elements are Tokens • Keywords, operators, punctuation, names, numbers, etc. • Tokens are described by regular expressions (Type 3 grammars) • Examples • Identifiers: letter (letter or digit)* • Integer: digit digit* • Terminal symbols of lexical grammar are usually characters • ASCII, Unicode, etc. • Escape sequences and tri-grams G22.2110-001
Syntax & Semantics • Syntax • Deals with Form • Gives structure to a stream of lexical elements • Semantics • Deals with meaning • Meaning often depends on context • Both syntax and semantics can be represented by grammars – attribute grammars are used for semantics. • Distinction is somewhat artificial • Syntax is that which can be conveniently expressed using a context free grammar • Semantics is everything else G22.2110-001
Language and Grammar • An Alphabet Σis a finite set of lexical symbols • Formal languages use letters of the alphabet as lexical symbols • Programming languages use Tokens • L systems use lines to draw realistic images of trees and flowers • Language L is a subset of strings in Σ* • A grammar G defines the subset of Σ* that belongs to L, and excludes the subset that does not belong to L. • A grammar can be used to generate new strings in L or to accept (or reject) strings in (or not it) L. G22.2110-001
CFG Example Block: { BlockStatementsopt } BlockStatements: BlockStatement BlockStatements BlockStatement BlockStatement: LocalVariableDeclarationStatement Statement LocalVariableDeclarationStatement: LocalVariableDeclaration ; LocalVariableDeclaration: TypeName VariableDeclaratorId Statement: while ( expr ) BlockStatement ; G22.2110-001
Context Free Grammars • Substitution Rules of the form: A ::= ω where A is a Non-Terminal symbol and ω is a string of terminal and non-terminal symbols • A Simple CFG for a language E • S ::= EXPR • S ::= EXPR S • EXPR ::= EXPR ‘+’ EXPR • EXPR ::= EXPR ‘–’ EXP • EXPR ::= ‘(‘ EXPR ‘)’ • EXPR ::= digit • At least one rule must have only terminal symbols on RHS • Every rule must have exactly one non-terminal on LHS • Terminal Symbols: digit + – ( ) • Non-Terminal Symbols: EXPR S • Examples of statements in E: 1 1+1 (1+1) - 1 G22.2110-001
Formal CFG • A CFG, G, is a 4-tuple G = (Σ, N, S, δ) • Σ is an alphabet of terminal symbols • N is a set of non-terminal symbols • S is a distinguished element of N, called the start symbol, which represents all strings in the language. • δ is a set of rules of the formA ::= ω | A N, ω (Σ, N)+ G22.2110-001
CFG Idioms • L ::= a L | a makes a list of one or more ‘a’s • L ::= a , L | a makes a comma separated list of ‘a’s • L ::= a L | λ makes a list of zero or more ‘a’s • λ is a null symbol • L :: L L | a | λanother way to make a list • P ::= (P) makes P’s within nested parenthesis of arbitrary depth. G22.2110-001
non-terminal symbols are identified by angle brackets e.g. <stmt> Terminal Symbols are token names or literal symbols “::=“ is definitional equivalence ‘|’ indicates “or” Many variations [ ] for optional elements Parentheses for grouping + and * (kleene star) Superscripts for n occurances Subscripts, opt in Java Italics or lowercase for Non-terminal symbols <stmt> ::= while (<exp>) <stmt> | if (<exp> ) <stmt> [else <stmt>] | id = EXP | <stmt_list> ; <stmt_list> ::= <stmt> | <stmt_list> <stmt> ; <exp> ::= <exp> <op> <exp> | ID | NUMBER; <op> ::= + | - | * | / ; Most language specifications use some variation of BNF Backus-Naur Form (BNF) G22.2110-001
Derivation & Parse Tree • Parse tree represents structure of parse • Leaf nodes are terminal symbols • Intermediate nodes are non-terminal symbols • Root node is start symbol of grammar • Derivation tree also records which rules were used to build tree • Each node represents a specific production • Example • (1 + 2 + 3 ) - 2 G22.2110-001
Grammars – Chomsky Hierarchy • Type 0 – Unrestricted • Can express anything that can be computed • Impossible to parse • Type 1 – Context Sensitive • Difficult to parse • Attribute Grammars used for programming language semantics • Type 2 – Context Free • CFGs used for describing programming language syntax • Type 3 – Regular • Used to describe lexical elements of programming languages G22.2110-001
Grammatical Problems • Programming languages use restricted grammars, such as LL or LR, which are not as powerful as general CFGs • Dangling Else – Not LR shift reduce conflict • S ::= if E then S • S ::= if E then S else S • Solutions: • Always choose shift • Specify endmarker e.g., endif • Left Recursion – Not LL • Ambiguity • Foo(A) (in C) declaration or use of function Foo? • Requires lookahead in parser or more complex grammar G22.2110-001
Programming Language History G22.2110-001