120 likes | 273 Views
COMP30330 2009-2010 Compiler Construction Lecturer: Dr. Arthur Cater Teaching Assistant: Santiago Villalba Demonstrator: Zeeshan Ahmed. Admin issues. 24 lectures, 3 assignments, 11 practical sessions 1 st assignment set on Thursday 21 Jan, due on Friday 12 Feb, worth 10%
E N D
COMP30330 2009-2010Compiler ConstructionLecturer: Dr. Arthur CaterTeaching Assistant: Santiago VillalbaDemonstrator: Zeeshan Ahmed http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Admin issues • 24 lectures, 3 assignments, 11 practical sessions • 1st assignment set on Thursday 21 Jan, due on Friday 12 Feb, worth 10% • 2nd assignment set on Thursday 4 Feb, due on Friday 26 March, worth 20% • 3rd assignment set on Thursday 4 March, due on Friday 23 April, worth 20% • 2 hour exam will occur after semester end. • Book “Compilers: Principles, Techniques and Tools” by Aho, Lam, Sethi & Ullman: 2nd edition • Each student should register for Monday practicals or Tuesday practicals. • Attendance records will be kept. • A Module Moodle exists at http://csimoodle.ucd.ie/moodle/course/view.php?id=98 http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
What does a compiler do? • Compilers translate programs written in a “high-level language” into some other form • That other form may be • machine code that can be directly executed by computer hardware • relocatable binary, needing more work on address references • assembly code, needing assembling &c • code for a virtual machine, such as the JVM for Java • equivalent code in another HLL, such as C http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Front-end and Back-end • The job of translating a program from one form to another is often broken down into two major stages: • Front-end: Analysing the source program, determining • how its characters form words, • how its words form statements, procedures, class definitions, etc • how its statements etc conform to language rules, such as • using only declared variables, • using operands of proper type for operators • … and reporting statically detectable errors in the program if they exist • Back-end: Generating an equivalent program in the target language • Multiple implementations of a source language for different computers may share a front end and match it with different back ends. http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Compilers vs Interpreters Interpreters do not translate programs, rather they simulate them. source program source program Compiler (&c) Interpreter output input target program input output http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Some special varieties of compiler • Cross compiler • Debugging compiler • Optimizing compiler • Batch compiler • Load-and-go compiler It is quite common for a compiler for a language to be written in that same language. http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Phases of a typical compiler source program intermediate representation lexical analyzer Machine-Independent Code Optimizer token stream intermediate representation syntax analyzer symbol table syntax tree Code Generator semantic analyzer target-machine code syntax tree Machine-Dependent Code Optimizer intermediate code generator target-machine code http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Software relatives Various other software tools perform similar analysis functions to a compiler’s • Syntax-directed editors • automatically insert text fragments near reserved words • Prettyprinters and colorizers • Static checkers • look for e.g. • unreachable code, • undeclared / unused variables, • datatype mismatches • html / xml browsers http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Metalanguages Chomsky hierarchy of types of language, distinguished by what restrictions may be placed on “productions” in an adequately descriptive “generative grammar”: Type 0 (unrestricted) any LHS may be replaced by any RHS aXbYcPqr Type 1 (context-sensitive) a single nonterminal in the context of a LHS may be replaced by anything else in the same context aXbYc aZwbYc Type 2 (context-free) a LHS may mention only a single nonterminal XpQr Type 3 (regular) a LHS may mention only a single nonterminal and a RHS may mention at most one terminal followed by at most one nonterminal XpY http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Relevance of types of language • Regular languages are often used to describe the word-level syntax of a HLL • rules for valid identifiers, numbers, strings, reserved words, etc • finite-state automata can recognise regular languages • tools such as ‘lex’, ‘Flex’, ‘ANTLR’, ‘JavaCC’ can build a lexical analyzer program ( lexer , scanner ) when supplied with a regular grammar describing the desired regular language; or hand coding can be used • Context-free languages are used to describe the phrase-level syntax of a HLL • rules for expressions, statements, compound statements, conditionals, etc • push-down automata can recognise context-free languages • tools such as ‘Yacc’, ‘Bison’, ‘ANTLR’, ‘JavaCC’ can build a syntax analyzer program ( parser ) when supplied with a context-free grammar describing the desired context-free language; or hand coding can be used http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Beyond Recognition • A Finite State Automaton can classify character sequences as numbers, ids, etc. • A parser can operate simply at the level of token sequences. • But mere yes/no judgements are not what is required of lexers, parsers. • Associating “semantic actions” with grammar productions allows lexers, parsers to • maintain a symbol table, distinguishing different identifiers • build the values of numeric expressions • generate simple code as a by-product of parsing http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Symbol table • An unsophisticated symbol table may have the following form: 0:0 1:4 2:9 3:\\ 4:\\ 5:\\ 6:\\ Semantic actions associated with state transitions in a finite state automaton can accumulate characters in a buffer, then at an accepting state look up in symbol table, and insert new entry if no match is found. http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction