1 / 35

Compiler construction in4020 – lecture 2

Compiler construction in4020 – lecture 2. Koen Langendoen Delft University of Technology The Netherlands. program in some source language. executable code for target machine. semantic represen- tation. front-end analysis. back-end synthesis. compiler. Summary of lecture 1.

amina
Download Presentation

Compiler construction in4020 – lecture 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler constructionin4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands

  2. program in some source language executable code for target machine semantic represen- tation front-end analysis back-end synthesis compiler Summary of lecture 1 • compiler is a structured toolbox • front-end: program text  annotated AST • back-end: annotated AST  executable code • lexical analysis: program text  tokens • token specifications • implementation by hand

  3. Quiz 2.7 What does the regular expression a?* mean? And a** ? Are these expressions erroneous? Are they ambiguous?

  4. program text token description scanner generator lexical analysis tokens syntax analysis AST context handling annotated AST Overview • Generating a lexical analyzer • generic methods • specific tool lex

  5. regular descriptions regular expressions + actions auxiliary C-code Token description • (f)lex: scanner generator for UNIX • token description  C code • format of the lex input file: definitions %% rules %% user code

  6. Lex description to recognize integers • an integer is a non-zero sequence of digits optionally followed by a letter denoting the base class (b for binary and o for octal). • base  [bo] integer  digit+ base? • rule = expr + action • {} signal application of a description %{ #include "lex.h" %} base [bo] digit [0-9] %% {digit}+ {base}? {return INTEGER;} %%

  7. Lexresulting C-code • char yytext[]; /* token representation */ • int yylex(void); /* returns type of next token */ • wrapper function to add token attributes %% \n {line_number++;} %% void get_next_token(void) { Token.class = yylex(); if (Token.class == 0) { Token.class = EOF; Token.repr = "<EOF>"; return; } Token.pos.line_number = line_number; Token.repr = strdup(yytext); }

  8. program text token description scanner generator lexical analysis digit tokens digit S1 S0 syntax analysis AST ‘.’ ‘.’ context handling digit digit annotated AST finite state automaton S2 S3 automatic generation

  9. ‘f’ ‘i’ S1 S2 S0 Finite-state automaton • Recognize input character by character • Transfer between states • FSA • Initial state S0 • set of accepting states • transition function: State x Char  State

  10. digit digit S0 S1 digit digit ‘.’ digit S0 S2 S3 FSA examples • integral_number  [0-9]+ • fixed_point_number  [0-9]* ‘.’ [0-9]+

  11. digit digit S0 S1 digit digit ‘.’ digit S0 S2 S3 Concurrent recognition • integral_number  [0-9]+ • fixed_point_number  [0-9]* ‘.’ [0-9]+ • recognize both tokens in one pass

  12. digit digit digit S0 S2 S3 Concurrent recognition • integral_number  [0-9]+ • fixed_point_number  [0-9]* ‘.’ [0-9]+ • naïve approach: merge initial states digit digit S1 ‘.’

  13. digit digit S0 S1 ‘.’ ‘.’ digit digit S2 S3 Concurrent recognition • integral_number  [0-9]+ • fixed_point_number  [0-9]* ‘.’ [0-9]+ • correct approach: share common prefix transitions

  14. digit digit S0 S1 ‘.’ ‘.’ digit digit S2 S3 FSA implementation:transition table • concurrent recognition of integers and fixed point numbers

  15. FSA exercise (6 min.) • draw an FSA to recognize integers base  [bo] integer  digit+ base? • draw an FSA to recognize the regular expression (a|b)*bab

  16. Answers

  17. a a b b b a S0 S1 S2 S3 a b Answers digit • integer • (a|b)*bab digit [bo] S0 S1 S2

  18. Break

  19. Automatic generation:description  FSA • start with initial set (S0) of all token descriptions to be recognized • for each character (ch) • find the set (Sch) of descriptions that can start with ch • extend the FSA with transition (S0,ch, Sch) • repeat adding transitions (to Sch ) until no new set is generated

  20. Dotted items • keeping track of matched characters in a token description: T  R input   regular expression already matched still to be matched T    

  21. Types of dotted items • shift item: dot in front of a basic pattern • if   ‘i’ ‘f’ • if  ‘i’  ‘f’ • identifier   [a-z] [a-z0-9]* • reduce item: dot at the end • if  ‘i’ ‘f’  • identifier  [a-z] [a-z0-9]*  • non-basic item: dot in front of repeated pattern or parenthesis • identifier  [a-z]  [a-z0-9]*

  22. c input T    c  c T   c   input Character moves

  23. c input T    c  c T   c   input c T   c   c  class T   [class]   T   .   Character moves • T    c  • T    [class]  • T    . 

  24.  moves

  25.  moves

  26. FSA construction • a state corresponds to a set of basic items • a character move yields a new set • expand non-basic items into basic items using  moves • see if the resulting set was produced before, if not introduce a new state • add transition

  27. S0 I  ( D)+ F  ( D)* ‘.’ (D)+ F  (D)*  ‘.’ (D)+ I   (D)+ F   (D)* ‘.’ (D)+  moves ExampleFSA construction • tokens • integer: I  (D)+ • fixed-point: F  (D)* ‘.’ (D)+ • initial state

  28. D S0 S1 I  ( D)+ F  ( D)* ‘.’ (D)+ F  (D)*  ‘.’ (D)+ D ‘.’ ‘.’ D S2 S3 D ExampleFSA construction • character moves I  (D )+ F  (D )* ‘.’ (D)+ I  (D)+  I  ( D)+ F  (D )* ‘.’ (D)+ I  (D)+  I  ( D)+ F  (D)*  ‘.’ (D)+ F  ( D)* ‘.’ (D)+ F  (D)* ‘.’  (D)+ F  (D)* ‘.’ ( D)+ F  (D)* ‘.’ (D)+  F  (D)* ‘.’ ( D)+ F  (D)* ‘.’ (D )+

  29. ExerciseFSA construction (7 min.) • draw the FSA (with item sets) for recognizing an identifier: identifier  letter(letter_or_digit_or_und*letter_or_digit+)? • extend the above FSA to recognize the keyword ‘if’ as well. if  ‘i’ ‘f’

  30. Answers

  31. Answers S3 ‘i’ U ID   L ((LDU)* (LD)+)? S0 LD L LD ‘f’ S4 ID  L ((LDU)* (LD)+)?  ID  L (( LDU)* (LD)+)? ID  L ((LDU)* ( LD)+)? LD S1 U LD U U ID  L (( LDU)* (LD)+)? ID  L ((LDU)* ( LD)+)? S2 accepting states S1 and S4

  32. Transition table compression

  33. S1 S2 S4 S0 S3 row displacement S3 S1 S1 S1 S1 S2 S1 S4 S1 S1 S1 S2 Transition table compression • redundant rows • empty transitions

  34. program text token description scanner generator lexical analysis tokens syntax analysis AST context handling annotated AST Summary: generating a lexical analyzer • tool: lex • token descriptions + actions • wrapper interface • FSA construction • dotted items • character moves •  moves

  35. Homework • study sections 2.1.10 – 2.1.12 • lexical identification of tokens • symbol tables • macro processing • print handout lecture3 [blackboard] • find a partner for the “practicum” • register your group • send e-mail to koen@pds.twi.tudelft.nl

More Related