590 likes | 603 Views
Explore the evolution of programming languages from machine language to high-level languages, including the factors that contribute to their success and the considerations in language design. Discover the different types of programming languages and their purposes.
E N D
2012학년도 1학기 프로그래밍 언어론 Lecture Note #2 2012년 1학기 조용주 ycho.smu@gmail.com
Introduction • Programming in Old Days • Earlier computers were monstrous devices • Filling several rooms • Consuming as much electricity as a good-size factory • Costing millions of 1940 dollars • Programmers were cheap • Programming in machine language • Machine language: • Sequence of bits that directly controls a processor, causing it to add, compare, move data from one place to another, and so on E.g. Calculating the greatest common divisor (GCD) of two integers 27bdffd0 afbf0014 0c1002a8 00000000 ….
Programming in Old Days • The next step was writing in assembly language • Allow operations to be expressed with mnemonic abbreviations E.g. GCD program in MIPS assembly language addiu sp, sp, -32 sw ra, 20(sp) jal getint nop jal getint sw v0, 28(sp) lw a0, 28(sp) …
Programming in Old Days • Assembly language • One-to-one mapping between mnemonics and machine language instructions • Translation from mnemonics to machine language done by assembler • Assemblers eventually augmented with “macro expansion” for common sequences of instructions • Easier than writing programs using machine language, but still difficult and machine-centered • People dreams about machine-neutral programming languages, specifically with numerical computations capability Fortran • Catches up slowly, though • More high-level languages, such as Lisp and Algol
The Art of Language Design • Why are there so many programming languages? • Evolution – we’ve learned better ways of doing things over time • Constant development of accomplishing things • goto-based control flow (Fortran, Cobol, Basic) while loop or case statement-based structured programming (Algol, Pascal, Ada) object oriented programming (Smalltalk, C++, Eiffel) • Special Purpose • Designed for a specific problem domain • Lisp dialects – symbolic data and complex data structures • C – low-level systems programming • Prolog – reasoning about logical relationships among data • Can be used for other generic work, but focus is on their specialty • Orientation toward special hardware
The Art of Language Design • Personal Preference • Different people like different things • It would make unlikely that anyone will ever develop a universally acceptable programming language • Diverse ideas about what is pleasant to use • Socio-economic factors • proprietary interests • commercial advantage
The Art of Language Design • What makes a language successful? • Expressive power • Debate over one language is more “powerful” than another • Language features • have a huge impact on the programmer’s ability to write clear, concise, and maintainable code, especially for very large systems • Ease of use for the novice • Basic, Pascal, LOGO, Scheme • Java? • Ease of implementation • Basic’s success also is due to that it can be implemented easily even on tiny machines, with limited resources • Forth and Squeak • Pascal and Java
The Art of Language Design • Open Source • Widely available • C language and UNIX operating system • Excellent Compilers • Fortran’s success is largely due to extremely good compilers • Common Lisp is successful in part because they have compilers and supporting tools • Economics, Patronage, and Inertia • Factors other than technical merit influence success • The backing of a powerful sponsor • Cobol and PL/I by IBM, Ada by DoD • C# backed by Microsoft • Some languages remain widely used because of base of installed software and programmer expertise • wide dissemination at minimal cost (Pascal, Java)
The Art of Language Design • No one factor determines whether a language is “good” • Need to consider issues from several points of view, specially both programmer and the language implementer • Why do we have programming languages? What is a language for? • way of thinking – way of expressing algorithms • languages from the user's point of view • abstraction of virtual machine – way of specifying what you want • the hardware to do without getting down into the bits • languages from the implementer's point of view
The Programming Language Spectrum • The top-level division of programming languages • declarative • The focus is on what the computer is to do • properties are declared, but no execution sequence is specified • higher level – more in tune with the programmer’s point of view, and less with the implementer's point of view • imperative • focus is on how the computer should do it • action oriented • computation is viewed as a sequence of actions • predominate mainly for performance reasons
The Programming Language Spectrum • Declarative • functional Lisp/Scheme, ML, Haskell • dataflow Id, Va1 • logic Prolog • constrained-based VisiCalc, Excel • Imperative • von Neumann Fortran, Pascal, Basic, C, … • object-oriented Smalltalk, Eiffel, C++, Java
The Programming Language Spectrum • Functional languages • employ a computational model based on the recursive definition of functions • based on lambda calculus • a program is considered a function from inputs to outputs, defined in terms of simpler functions through a process of refinement • Iteration is often accomplished through recursion • Lisp, ML, Haskell
The Programming Language Spectrum • Dataflow languages • model computation as the flow of information (tokens) among primitive functional nodes • Conceptually or physically implements a directed graph of the data flowing between operations • nodes are triggered by the arrival of input tokens, and can operate concurrently • Id, Val, LabView
The Programming Language Spectrum • Logic or constraint-based language • model computation as an attempt to find values that satisfy certain specified relationships, using goal-directed search through a list of logical rules • Prolog – best-known logic language • VisiCalc, Excel, Lotus 1-2-3
The Programming Language Spectrum • von Neumann languages • the basic means of computation is the modification of variables • sometimes described as computing via side effects • based on statements (assignments in particular) that influence subsequent computation by changing the value of memory • Fortran, Pascal, Basic, C • Side effects (from Wikipedia) • A function or expression is said to produce a side effect if it modifies some state in addition to returning a value
The Programming Language Spectrum • Scripting Languages • A subset of the von Neumann languages • Distinguished by their emphasis on “gluing together” components that were originally developed as independent programs • csh, bash – input language for job control • awk – text manipulation • PHP and JavaScript – intended for the generation of web pages with dynamic content • Other script languages, such as Python, Perl, Ruby, Tcl – more generic
The Programming Language Spectrum • Object-oriented languages • closely related to von Neumann languages • picture computations as interactions among semi-independent objects, each of which has both its own internal state and executable functions to manage that state • Smalltalk, C++, Java
Why Study Programming Languages? Interest and Practicality Want to learn “under the hood” Help you choose a language Select a language that is most suitable for your needs Eg. C vs. C++ vs. C# for systems programming? Fortran vs. C for scientific (or numerical) computations? PHP or Ruby for a web-based application? Ada vs. C for embedded systems? Visual Basic or Java or C# for a graphical user interface?
Why Study Programming Languages? Make it easier to learn new languages Some languages are similar; easy to walk down family tree (see http://oreilly.com/news/graphics/prog_lang_poster.pdf) Manylanguages are closely related Java and C# Common Lisp and Scheme Haskell if you already know ML Basic concepts underlying all programming languages, such as types, control (iteration, selection, recursion, concurrency), abstraction, and naming Easier to assimilate the syntax (form) and semantics (meaning) of new languages
Possible Answers to “Why Study Programming Languages – Design and Implementation?” • Understanding obscure features • An understanding of basic concepts makes it easier to understand these features when you look up the details in the manual • In C, help you understand unions, arrays & pointers, separate compilation, varargs, catch and throw • In Common Lisp, help you understand first-class functions/closures, streams, catch and throw, symbol internals • In C++, unions, multiple inheritance, variables number of argumetns
Choose among alternative ways to express things or help you make better use of whatever language you use Based on a knowledge of implementation costs understand implementation costs: choose between alternative ways of doing things, based on knowledge of what will be done underneath: use simple arithmetic equal (use x*x instead of x**2) in Basic use C pointers or Pascal "with" statement to factor address calculations avoid call by value with large data items in Pascal avoid the use of call by name in Algol 60 choose between computation and table lookup (e.g. for cardinality operator in C or C++) Possible Answers to “Why Study Programming Languages – Design and Implementation?”
Help you make better use of whatever language you use figure out how to do things in languages that don't support them explicitly: lack of suitable control structures in Fortran use comments and programmer discipline for control structures lack of recursion in Fortran etc Iterators of Clu, C#, Python, and Ruby can be imitated with subroutines and static variables write a recursive algorithm then use mechanical recursion elimination (even for things that aren't quite tail recursive) Possible Answers to “Why Study Programming Languages – Design and Implementation?”
Help you make better use of whatever language you use figure out how to do things in languages that don't support them explicitly: lack of named constants and enumerations in Fortran use variables that are initialized once, then never changed lack of modules in C and Pascal use comments and programmer discipline lack of iterators in just about everything fake them with (member?) functions Possible Answers to “Why Study Programming Languages – Design and Implementation?”
Make good use of debuggers, assemblers, linkers, and related tools Simulate useful features in languages that lack them Iterators can be implemented using static variables and functions In Fortran 77, which lacks recursion, an iterative program can be derived via mechanical hand transformations, starting with recursive pseudo code Understand the interactions of languages with operating systems and architectures Make better use of language technology wherever it appears Code to parse, analyze, generate, optimize, and otherwise manipulate structured data can be found in many other programs Possible Answers to “Why Study Programming Languages – Design and Implementation?”
Many system programs and applications have a language-like flavor to them Knowing about “real” languages will make it easier to use thse language-like relatives, and to design things like them command interpreters (Unix shells, DOS’ command line interface) report-generating systems (RPG, Awk) programmable editors (emacs) programmable applications (HyperCard, VisiCalc, Excel) configuration files and command-line options Possible Answers to “Why Study Programming Languages – Design and Implementation?”
Compilation and Interpretation • Compilation vs. interpretation • not opposites • not a clear-cut distinction • Pure Compilation • The compiler translates the high-level source program into an equivalent target program (typically in machine language), and then goes away:
Compilation and Interpretation • Pure Interpretation • Interpreter stays around for the execution of the program • Interpreter is the locus of control during execution
Compilation and Interpretation • Interpretation • Greater flexibility • Better diagnostics (error messages) • types and sizes may be decided upon the input data • even names that refer to which variables • In Smalltalk, all type checking is delayed until run time. References to objects of arbitrary types (classes) can then be assigned into arbitrary named variables • Compilation • Better performance
Compilation and Interpretation • Common case is compilation or simple pre-processing, followed by interpretation • Most language implementations include a mixture of both compilation and interpretation • Two characteristics differentiating compilation from interpretation • Thorough analysis • Nontrivial transformation – no strong resemblance to the source
Compilation and Interpretation – Implementation Strategies • Most interpreted languages employ an initial translator (a preprocessor) • removes comments and white spaces • groups characters together into tokens such as keywords, identifiers, numbers, and symbols • may expand abbreviations in the style of a macro assembler • may identify higher-level syntactic structures, such as loops and subroutines => to produce an intermediate form that mirrors the structure of the source, but can be interpreted more efficiently
Compilation and Interpretation – Implementation Strategies • Typical Fortran implementation comes close to pure compilation • the compiler translates Fortran source into some intermediate machine code (object code) • relies on the existence of a library of subroutines that are not part of the original program, such as, mathematical functions and I/O • linker merges the appropriate library into the final program
Compilation and Interpretation – Implementation Strategies Note that compilation does NOT have to produce machine language for some sort of hardware Compilation is translation from one language into another, with full analysis of the meaning of the input Compilation entails semantic understanding of what is being processed; pre-processing does not A pre-processor will often let errors through. A compiler hides further steps; a pre-processor does not
Compilation and Interpretation – Implementation Strategies Compilers may generate assembly language instead of machine language
Compilation and Interpretation – Implementation Strategies Compilers for C language begin with a preprocessor that removes comments and expands macros. The preprocessor can also be instructed to delete portions of the code itself using conditional compilation facility
Compilation and Interpretation – Implementation Strategies Early C++ implementations (at the Bell lab in AT&T) generated an intermediate program in C C++ compiler was a true compiler that performed a complete analysis of the syntax and semantics of the C++ source program
Compilation and Interpretation – Implementation Strategies • Many early Pascal compilers were built around a set of tools distributed by Niklaus Wirth including • a Pascal compiler, written in Pascal, that would generate output in P-code, a simple stack-based language • the same compiler, already translated into P-code • a P-code interpreter, written in Pascal
Compilation and Interpretation – Implementation Strategies • Compilation of Interpreted Languages • The compiler generates code that makes assumptions about decisions that won’tbe finalized until runtime. If these assumptions are valid, the code runs very fast. If not, a dynamic check will revert to the interpreter. • Dynamic and Just-in-Time Compilation • In some cases a programming system may deliberately delay compilation until the last possible moment. • Lisp or Prolog invoke the compiler on the fly, to translate newly created source into machine language, or to optimize the code for a particular input set. • The Java language definition defines a machine-independent intermediate form known as byte code. Byte code is the standard format for distribution of Java programs. • The main C# compiler produces .NET Common Intermediate Language (CIL), which is then translated into machine code immediately prior to execution.
Compilation and Interpretation – Implementation Strategies • Microcode • Assembly-level instruction set is not implemented in hardware; it runs on an interpreter. • Interpreter is written in low-level instructions (microcode or firmware), which are stored in read-only memory and executed by the hardware. • Compilers exist for some interpreted languages, but they aren't pure: • selective compilation of compilable pieces and extra- sophisticated pre-processing of remaining source. • Unconventional compilers • text formatters (TeX, troff, Postscript) • silicon compilers • query language processors
Programming Environment Tools • Tools
Compilation • Typical compilation proceeds through a series of well-defined phases • First few steps (up through semantic analysis) serve to figure out the meaning of the source program sometimes called front end • The last few phases serve to construct an equivalent target program sometimes called back end • A pass is a phase or set of phases that is serialized with respect to the rest of compilation: • a new pass is not started until the previous phases have completed • Two reasons of using passes in compilation • Share the front end and/or backend • Reduce the requirements of memory usage
Compilation Phases of Compilation
Lexical and Syntax Analysis • Scanning: • divides the program into "tokens", which are the smallest meaningful units; this saves time, since character-by-character processing is slow • we can tune the scanner better if its job is simple; it also saves complexity (lots of it) for later stages • you can design a parser to take characters instead of tokens as input, but it isn't pretty • scanning is recognition of a regular language, e.g., via DFA (Deterministic Finite Automaton) that recognizes the tokens of a programming language
Lexical and Syntax Analysis • Parsingis recognition of a context-free language, e.g., via PDA (Push-down automaton) • Parsing discovers the "context free" structure of the program • Informally, it finds the structure you can describe with syntax diagrams (the "circles and arrows" in a Pascal manual)
Lexical and Syntax Analysis • Semantic analysis is the discovery of meaning in the program • The compiler actually does what is called STATIC semantic analysis. That's the meaning that can be figured out at compile time • Some things (e.g., array subscript out of bounds) can't be figured out until run time. Things like that are part of the program's DYNAMIC • Intermediate form (IF) done after semantic analysis (if the program passes all checks) • IFs are often chosen for machine independence, ease of optimization, or compactness (these are somewhat contradictory) • They often resemble machine code for some imaginary idealized machine; e.g. a stack machine, or a machine with arbitrarily many registers • Many compilers actually move the code through more than one IF
Lexical and Syntax Analysis • Optimization takes an intermediate-code program and produces another one that does the same thing faster, or in less space • The term is a misnomer; we just improve code • The optimization phase is optional • Code generation phase produces assembly language or (sometime) relocatable machine language • Certain machine-specific optimizations (use of special instructions or addressing modes, etc.) may be performed during or after target code generation
Lexical and Syntax Analysis #include <iostream> using namespace std; int i, j ; void main() { cin >> i >> j ; while( i != j) { if ( i > j) i = i – j ; else j = j – i ; } cout << i ; } • Symbol table: all phases rely on a symbol table that keeps track of all the identifiers in the program and what the compiler knows about them • This symbol table may be retained (in some form) for use by a debugger, even after compilation has completed program gcd(input, output); var i, j : integer; begin read(i, j); while i <> j do if i > j then i := i – j else j := j – i; writeln(i) end.
Lexical and Syntax Analysis • Scanning (lexical analysis) and parsing • GCD Program Tokens • scanner reads characters (‘p’, ‘r’, ‘o’, ‘g’, ‘r’, ‘a’, ‘m’, ‘;’, ‘g’,’c’,’d’, etc.) and divides the program into “tokens”, which are the smallest meaningful units of the program program gcd ( input , output ) ; var i , j ; integer ; begin • the purpose of the scanner is to simplify the task of the parser, by reducing the size of the input and by removing extraneous characters and comments • May produce a listing if desired, and tag tokens with line and column numbers, to make it easier to generate good diagnostics in later phases
Lexical and Syntax Analysis • Lexical and Syntax Analysis • Context-Free Grammar and Parsing • Parsing organizes tokens into a parse tree that represents higher-level constructs in terms of their constituents • Potentially recursive rules known as context-free grammar define the ways in which these constituents combine • A Pascal program consists of the keyword program, followed by an identifier (the program name), a parenthesized list of files, a semicolon, a series of definitions, and the mainbegin … end block, terminated by a period:
Lexical and Syntax Analysis • Context-Free Grammar and Parsing • Example (Pascal program)
Lexical and Syntax Analysis A context-free grammar is said to define the syntax of the language; parsing is therefore known as syntactic analysis In the process of scanning and parsing, the compiler checks to see that all of the program’s tokens are well-formed, and that the sequence of tokens conforms to the syntax defined by the context-free grammar Any malformed tokens (e.g., 123abc or $@foo in Pascal) should cause the scanner the scanner to produce an error message Any syntactically invalid token sequence (e.g., A := B C D in Pascal) should lead to an error message also leads to an error message from the parser