440 likes | 579 Views
Lesson 1. CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg. Outline. Introduction to compilers Regular languages Regular expressions Finite automata Grammars. Introduction to compilers. What is a compiler?. Why study compiler theory?. Easily create language processors
E N D
Lesson 1 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg
Outline • Introduction to compilers • Regular languages • Regular expressions • Finite automata • Grammars
Why study compiler theory? • Easily create language processors • Parser for configuration files (e.g. XML) • Translator from one language to another • Command line interpreter • Etc...
Why study compiler theory? • Deeper understanding of compilers • How to write efficient code • Understand design decisions in a language
Why study compiler theory? • Improve your programming skills • Top–down design • Thinking “recursively” • Processing of data structures • Easier learning new languages
History • 1st generation: programming binary • 2nd generation: assembly code • 3rd generation: structured languages • Fortran, Ada, C, … • Increased productivity • Reduced logical errors
History • 3rd generation required compilers • First Fortran compiler took 18 man years to complete • Today: a student can develop a compiler in a 10-week course!
Lexical analysis • Characters → tokens • Lexemes
Syntactical analysis (parsing) • Tokens → syntax tree
Definition: alphabet • Finite set of symbols • Examples: • Latin alphabet: { a, …, z, A, …, Z } • Decimal digits: { 0, …, 9 } • Binary digits: { 0, 1 } • Often denoted Σ
Definition: string • Sequence of symbols from an alphabet • Examples: • ”Hello” over { a, …, z, A, …, Z } • ”16332” over { 0, …, 9 } • Length of a string: |Hello| = 5 • The empty string: ε, |ε| = 0
Definition: language • (In)finite set of strings • Examples: • { January, …, December } • Alphabet: { a, …, z, A, …, Z } • { 0, …, 9, 10, …, 19, 20, … } • Alphabet: { 0, …, 9 } • The empty language: Ø • Does not even contain ε
Operations on languages Let L1 = { ab, cd } and L2 = { ij, kl } • Concatenation: • L1L2 = { abij, abkl, cdij, cdkl } • L2L1 = { ijab, ijcd, klab, klcd } • Union • L1 U L2 = { ab, cb, ij, kl } • Kleeneclosure • L1* = { ε, ab, abab, abcd, abcdab,cdcdab, abcdcdcdab, ... }
Examples of regular languages • Keywords • if, while, public, … • Identifiers • x, tmp1, tmp2, my_func, main, … • Numericliterals • 142, 0x23A0F, 23.8, … • Operators • +, -, +=, (, ), …
Regular expressions • Specify regular languages • Used in e.g. sed, grep, Visual Studio • Mixes symbols and operators: • ab* • (bla)+ • K(ä|je|a|ae)llberg
Operators in regular expressions • Concatenation • Kleene star: * • Union: | • Syntactic sugar • Character classes, +, ?, etc. • Operator precedence: • * and + • Concatenation • Union
Examples of regular expressions • Date strings, e.g. “2011-04-04” • Regular definition: D → 0 | ... | 9 • Regular expression: DDDD-DD-DD
Examples of regular expressions • E-mail addresses: s1@s2, where s1 and s2 are strings of letters, digits, and periods, where a period may not appear at the beginning or the end, and two periods may not appear in succession...
Examples of regular expressions • Regular definition: S→ a | … | z | A | … | Z | 0 | … | 9 • Regular expression: S+(.S+)*@S+(.S+)*
Exercise (1) • Write regular expressions for • valid identifier names in e.g. C, C#, or Java – a letter or an underscore (“_”) follow by zero or more letters, digits, or underscores. • strings over the alphabet { a, b } that begin and end with the same letter. • numbers evenly divisible by 2. Write expressions for both the decimal alphabet and the binary alphabet.
Regular expressions on the web http://www.regular-expressions.info/
DFA • Deterministic Finite Automata • States and state transitions • Consumes strings • Initial and final states
NFA • Nondeterministic Finite Automata • More than one transition per(state, input symbol) • Several initial states • ε transitions
Exercise (2) • Create DFA:s that accept • the languages Ø, { ε }, { a }, S+, and S* • e-mail addresses. • Recall: S+(.S+)*@S+(.S+)* • strings over { a, b } that start and end with the same letter.
Limitations of regular languages • Example:The language of well-formed parenthesis expressions:{ ε, (), (())(), ()(()()), (()()((()))()), ... } • Try to create a finite automaton...
Grammars • More powerful • Example: • Language: { a, ab, abb, abbb, ... } • Grammar:
Grammars S → a B B → ε B → b B • a and b = terminals • S and B = nonterminals • S = starting symbol
Grammars S → a B B → ε B → b B • Derivation of ”abb”: S ⇒ a B ⇒ a b B ⇒ a b b B ⇒ a b b
Grammars S → a B B → ε B → b B • Derivation of ”abb”: S⇒ a B ⇒ a b B ⇒ a b b B ⇒ a b b
Grammars S → a B B → εB → b B • Derivation of ”abb”: S ⇒ a B⇒ a b B ⇒ a b b B ⇒ a b b
Grammars S → a B B → εB → b B • Derivation of ”abb”: S ⇒ a B ⇒ a b B⇒ a b b B ⇒ a b b
Grammars S → a BB → ε B → b B • Derivation of ”abb”: S ⇒ a B ⇒ a b B ⇒ a b b B⇒ a b b
Conclusion • Parts of a compiler • Regular languages • Regular expressions • Finite automata • Grammars
Next time • Context-free languages • More grammars • Parsetrees • Push down automata