1 / 19

Tools for building compilers

Tools for building compilers. Clara Benac Earle. Tools to help building a compiler. C Lexical Analyzer generators: Lex, flex, Syntax Analyzer generator: yacc Java Lexical Analyzer generators: JLex, JFlex, Syntax Analyzer generator: CUP.

travis
Download Presentation

Tools for building compilers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tools for building compilers Clara Benac Earle

  2. Tools to help building a compiler • C • Lexical Analyzer generators: Lex, flex, • Syntax Analyzer generator: yacc • Java • Lexical Analyzer generators: JLex, JFlex, • Syntax Analyzer generator: CUP These tools with their documentation can be found on the internet

  3. Lex: Lexical Analyzer Generator example.l lex.yy.c a.exe C compiler Lex Compiler

  4. Description • A tool for generating scanners • The scanner is described as pairs of regular expressions and C code • Flex generates as output a C source file, lex.yy.c, which defines a routine yylex(). This file produces an executable • When the executable is run, it analyzes its input for occurrences of the regular expressions. Whenever it finds one, it executes the corresponding C code

  5. Format of the input file The flex input file consists of three sections separated by %% Definitions %% Rules %% User Code

  6. Skeleton of a lex specification (.l file) %{ < C global variables, prototypes, comments > %} [DEFINITION SECTION] %% [RULES SECTION] %% < C auxiliary subroutines> This part will be embedded into *.c substitutions, code and start states; will be copied into *.c define how to scan and what action to take for each token any user code. For example, a main function to call the scanning function yylex().

  7. The definition section • Contains name definitions and declarations of start conditions • Name definitions have the form: name definition • Examples: DIGIT [0-9] ID [a-z][a-z0-9]*

  8. The rules section • Form: %% <pattern> { <action to take when matched> } <pattern> { <action to take when matched> } … %% • Patterns are specified by regular expressions • Examples: %% [A-Za-z]* { printf(“this is a word”); } %%

  9. Extended regular expressions x match the character “x” . any character except newline [] a character class [xy] match either an “x” or a “y” [a-z] match any letter from “a” to “z” [^a-z] any character but those in the class r* zero or more r´s r+ one or more r´s r? zero or one r {name} the expansion of the name definition

  10. Extended regular expressions x|y x or y x/y x, only if followed by y (y not removed from input) x{m,n} m to n occurrences of x  x x, but only at beginning of line x$ x, but only at end of line "s" exactly what is in the quotes (except for "\" and following character) A regular expression finishes with a space, tab or newline

  11. Meta-characters • meta-characters (do not match themselves, because they are used in the preceding reg exps): • ( ) [ ] { } < > + / , ^ * | . \ " $ ? - % • to match a meta-character, prefix with "\" • to match a backslash, tab or newline, use \\, \t, or \n

  12. Regular Expression Examples • an integer: 12345 • [1-9][0-9]* • a word: cat • [a-zA-Z]+ • a (possibly) signed integer: 12345 or -12345 • [-+]?[1-9][0-9]* • a floating point number: 1.2345 • [0-9]*”.”[0-9]+

  13. Two Rules • lex will always match the longest (number of characters) token possible. • 2. If two or more possible tokens are of the same length, then the token with the regular expression that is defined first in the lex specification is favored.

  14. How the input is matched Once the match is determined, the text corresponding to the match is made available in the global character pointer yytext, and its length in the global integer yyleng. The action corresponding to the matched pattern is then executed, and then the remaining input is scanned for another match

  15. Actions • Can be any arbitrary C statement • Normally they are written between {} • If the action is empty, then when the pattern is matched the input token is simply discarded • The action “|” means “same as the action for the next rule”

  16. Actions: examples %% [ \t \n]+ ; ":=" return ASIG; "<“ return MINOR; "if" return IF;

  17. Start conditions A mechanism for conditionally activating rules %s comment %% “/*” { BEGIN comment; } <comment>”*/” { END comment; /* = BEGIN 0; */ } <comment>. { }

  18. Special Functions • yytext • where text matched most recently is stored • yyleng • number of characters in text most recently matched • yylval • associated value of current token • yymore() • append next string matched to current contents of yytext • yyless(n) • remove from yytext all but the first n characters • unput(c) • return character c to input stream • yywrap() • may be replaced by user • The yywrap method is called by the lexical analyzer whenever it inputs an EOF as the first character when trying to match a regular expression

  19. Let us run a lex program

More Related