580 likes | 823 Views
Modern Programming Languages. (Cs-432) Lecture # 04. Implementation Methods. Programming languages can be implemented by any of three general methods Compilation Pure interpretation Hybrid implementation systems. Compilation.
E N D
Modern Programming Languages (Cs-432) Lecture # 04
Implementation Methods • Programming languages can be implemented by any of three general methods • Compilation • Pure interpretation • Hybrid implementation systems
Compilation • At one extreme, programs can be translated into machine language, which can be executed directly on the computer, this method is called a “compiler implementation” and has the advantages of very fast program execution, once the program translation process is completed • Most production implementations of languages, such as C, COBOL, C++ and Ada are by compilers
Compilation • The language that a compiler translates is called the “source language” • The compilation process and program execution takes place in several phases
Compilation • The lexical analyzer gathers the characters of the source program into lexical units • The lexical units of a program are identifiers, special words, operators and punctuation symbols • The lexical analyzer ignore the comments in the source program because the compiler has no use of them
Compilation • The syntax analyzer takes the lexical units from the lexical analyzer and uses them to construct hierarchical structures called “parse trees” • These parse trees represents the syntactic structure of the program
Compilation • The intermediate code generator produces a program in a different language, at an intermediate level between the source program and the final output of the compiler • Intermediate code sometimes look very much like assembly languages, in fact sometimes are actually assembly codes
Compilation • The semantic analyzer is an internal part of the intermediate code generator • The semantic analyzer checks for errors, such as type errors, that are difficult to detect during syntax analysis
Compilation • Optimization which improves programs by making them smaller or faster or both • Because many kind of optimization are difficult to do on machine language, most optimization is done on intermediate code
Compilation • The code generator translates the optimized intermediate code version of the program into an equivalent machine language program
Compilation • The symbol table serves as a database for the compilation process • The primary contents of the symbol table are the type and attribute information of each user-defined name in the program • This information is placed in the symbol table by the lexical and syntax analyzer and is used by the semantic analyzer and the code generator
Compilation • Most user programs also require programs from the operating system like input/output • The compiler builds calls to required system programs when they are needed by the user program • Before the machine language programs produced by a compiler can be executed, the required programs from the operating system must be found and linked to the user program
Compilation • The process of collecting system programs and linking them to user programs is called “linking” and “loading” • The process of linking is performed by a system program called “linker” • Linker does not link only system programs rather it can link user programs that resides in libraries • The user and system code together are sometimes called a “load module” or “executable image”
Compilation • The speed of connection between a computer’s memory and its processor usually determines the speed of the computer, because instruction often can be executed faster than they can be moved to the processor for execution • This connection is called the von Neumann bottleneck • It is the primary limiting factor in the speed of von Neumann architecture computers
Pure Interpretation • Pure interpretation lies at the opposite end (from compilation) of implementation methods • With this approach, programs are interpreted by another program called an interpreter, with no translation, whatever • The interpreter program acts as a software simulation of a machine whose fetch-execute cycle deals with high-level language program statements rather than machine instructions • This software simulation obviously provides a virtual machine for the language
Pure Interpretation • Pure interpretation has the advantage of allowing easy implementation of many source-level debugging operations, because all runtime error messages can refer to source level units • This system has the serious disadvantage that execution is 10 to 100 times slower than in compiled systems • Decoding • Similar statement is decoded every time it appears in source code • Pure interpretation often requires more space, in addition to the source program, the symbol table must be presented during interpretation, which performed every time the source is executed • Php is the example of pure interpretation
Hybrid implementation systems • Some language implementations systems are a compromise between compilers and pure interpreters, they translate high level language program to an intermediate language designed to allow easy interpretation • This method is faster than pure interpretation because the source language statements are decoded only once
Hybrid implementation systems • Java implementation is hybrid, its intermediate form called “byte code”, provides portability to any machine that has a byte code interpreter and associated runtime system • Together these are called a “Java Virtual Machine” • Just In Time (JIT) compilation in .NET also translates programs to an intermediate language • A Just-in-Time ( JIT) implementation system initially translates programs to an intermediate language. Then, during execution, it compiles intermediate language methods into machine code when they are called
Preprocessors • A preprocessor is a program that processes a program immediately before the program is compiled • Preprocessor instructions are embedded in programs • Preprocessor instructions are commonly used to specify that the code from another file is to be included • For example #include”iostream.h” • Another preprocessor instruction are used to define symbols to represent expressions
Preprocessors • For example #define max(A, B) ((A)>(B)?(A):(B)) • To determine the largest of two given expressions • For example x = max(2*y , z/1.73)
Programming Environments • A programming environment is the collection of tools used in the development of software • It consist of file system, text editor, link and a compiler at least
Programming Environments • Jbuilder is a programming environment that provides integrated compiler, editor, debugger and file system in one GUI, for java development • Microsoft Visual Studio .NET is another programming environment, it consist of large collection o software development tools • This system can be used to develop software in any of the five, C#, Visual Basic, Jscript, F# (Functional Language) and C++ • NetBeans is a development environment that is primarily used for java application development but also support JavaScript, Ruby and PHP
Summary • Introduction to MPL • Programming domains • Language evaluation criteria • Language trade-offs • Influences on language design • Programming design methodologies • Language categories • Implementation methods • Preprocessors • Programming environments
Assignment • Please find the implementation details of any language available these days (Java, C#, Visual Basic, Php etc) [Submit a hard copy not more than 02 pages] [Please avoid copy/paste and submit whatever you understand ] • Plagiarism will be treated strictly
Syntax and Semantics • The study of programming languages can be divided into examination of Syntax and Semantics • The Syntax of a programming language is the form of its expressions, statements and program units • Semantic is the meaning of those expressions, statement and program units • Although they are often separated for discussion purposes, but syntax and semantics are closely related
The General Problem of Describing Syntax • A language, whether natural (English) or artificial (java), is a set of strings of characters from some alphabet set • The string of language are called sentences or statements • The syntax rules of a language specify which strings of characters from the language’s alphabet are in the language • In comparison to natural languages, programming languages are syntactically very simple and concrete
The General Problem of Describing Syntax • Lowest level syntactic units are called Lexemes • The description of lexemes can be given by a lexical specification, which usually separate from syntactic description of the language • The lexemes of a programming language include its numeric literals, operators and special words • Program is strings of lexemes rather than of characters
The General Problem of Describing Syntax • Lexemes are partitioned into groups, like the name of variables, methods, classes etc in a programming language called Identifiers • Each lexemes group is represented by a name, called Token • Token of a language is a category of its lexemes, for example an identifier is a token that can have lexemes, or instances, such as sum and total • In some cases a token has only a single possible lexeme for example “+” arithmetic operator
The General Problem of Describing Syntax • For example consider the following statement: Index = 2*count+17; • The lexemes and tokens of this statement are:
Language Definitions • Languages can be formally defined in two distinct ways • By Recognition • By Generator
Language Recognizer • Suppose we have a language L that uses an alphabet set ∑ of characters, to define L formally using the recognition method, we would need to construct a method R, called a recognition device, capable of reading strings and indicate whether a given input string was in L or not • When fed any string of character over ∑, accepts it only if it is in L, then R is the description of L • This might seem like a lengthy and ineffective process
Language Recognizer • In next method the syntax analysis part of a compiler is a recognizer for the language the compiler translates • In this role, the recognizer need not test all possible strings of characters from some set to determine whether each is in the language, rather it need only determine whether given programs are in the language • In effect then, the syntax analyzer determine whether the given program are syntactically correct • The structure of syntax analyzer is also known as Parser
Language generators • A language generator is a device that can be used to generate the sentences of a language • There is a close connection between formal generation and recognition devices for the same language, we will discuss it later
Formal Methods for Describing Syntax • The formal language generation mechanisms are called grammars, that are commonly used to describe the syntax of programming languages
Backus-Naur Form and Context-Free Grammars • In the middle 1950s, Noam Chomsky and John Backus, developed the syntax description formalism, which become the most widely used method for programming languages syntax
Context-Free Grammars • Chomsky described four classes of generative devices or grammars that define four classes of languages • Two of these grammars classes, named context free and regular are turned out to be useful for describing the syntax of programming languages • The forms of the tokens of programming languages can be described by regular grammars • The syntax of whole programming languages, with minor exceptions, can be described by context free grammars • His work was later applied to programming languages
Backus-Naur Form (BNF) • John Backus introduced a new formal notation for specifying programming language syntax • A meta-language is language that is used to describe another language. BNF is a meta language for programming languages
Backus-Naur Form (BNF) • BNF uses abstraction for syntactic structure, for example a simple assignment statement might be represented by the abstraction like <assign> -> <var> = <expression> • The text on the left side of arrow is abstraction being defined • The text of the right side of arrow is the definition of abstraction • The right side consist of mixture of tokens, lexemes and references to other abstractions • Altogether, the definition is called a rule or production Total = subtotal1+subtotal2
Backus-Naur Form (BNF) • The abstraction in a BNF description or grammar are often called non-terminal symbols and the lexemes and tokens of the rules are called terminal symbols • A BNF description or grammar is a collection of rules • S -> AB • S -> ASB • A -> a • B -> b
Grammars and Derivations • The sentences of the language are generated through a sequence of application of the rules, beginning with a special non-terminal of the grammar called start symbol • This sequence of rule applications is called a derivation • S -> AB • S -> ASB • A -> a • B -> b • S -> AB | ASB • A -> a • B -> b
Grammars and Derivations • Solve: S 01S | 0S1 | S01 | 10S | 1S0 | S10 |
Grammars and Derivations • The derivation of a program in this language is as follows:
Grammars and Derivations • The derivation begins with the start symbol <program> • Each successive string in the sequence is derived from the previous string by replacing one of the non-terminals with its definitions • Each of the strings in the derivation, including <program>, is called sentential form • The sentential form, consisting of only terminals, or lexemes, is the generated sentence
Grammars and Derivations • By choosing alternative RHSs of rules in the derivation, different sentences in the language can be generated • By exhaustively choosing all combinations of choices, the entire language can be generated • This language, like most others is infinite, so one cannot generate all the sentences in the language in finite time
Grammars and Derivations • Lets have another example
Grammars and Derivations • The grammar describes assignment statements whose right sides are arithmetic expressions with multiplication and addition operators and parenthesis for example A=B*(A+C)
Parse Trees • One of the most attractive feature of grammars is that they naturally describe the hierarchical syntactic structure of the sentences of the language they define • These hierarchical structure are called parse trees