1 / 20

Implementing a Scanner

sc4312. Lecture 2b. 2. Scanner as a DFA. A scanner is a big DFA.. . 0. " ". . 1. . letter. letter. digit. . 2. . digit. digit. . . 3. . (. . 4. . >. . 5. . =. .... ident. number. lpar. gtr. geq. sc4312. Lecture 2b. 3. Scanning. Scanners tend to be built in two waysad-hoc or handwritten using goto or nested case statementsfor more details see Figure 2.11table-driven DFA.

elina
Download Presentation

Implementing a Scanner

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. sc4312 Lecture 2b 1 Implementing a Scanner SC4312 Compiler Construction Originally prepared by Dr.Songsak Channarukul Modified by Dr.Kwankamol Nongpong

    2. sc4312 Lecture 2b 2 Scanner as a DFA A scanner is a big DFA.

    3. sc4312 Lecture 2b 3 Scanning Scanners tend to be built in two ways ad-hoc or handwritten using goto or nested case statements for more details see Figure 2.11 table-driven DFA

    4. sc4312 Lecture 2b 4 Scanning Ad-hoc generally yields the fastest, most compact code by doing lots of special-purpose things, though good automatically-generated scanners come very close Table-driven DFA is what scanner generators like lex and scangen produce lex (flex) in the form of C code scangen in the form of numeric tables and a separate driver (for details see Figure 2.12)

    5. sc4312 Lecture 2b 5 The Rule (almost universal) The scanner always accepts the longest-possible token from the input. One legitimate token could be a prefix of another token. However, we cant tell whether a longer token is possible without peeking at more than one character ahead (lookahead).

    6. sc4312 Lecture 2b 6 Lookahead Take Pascal for example, when you have a 3 and you a see a dot, you need to peek at the character beyond the dot. 3.14 (a single token designating a real number) 3..5 (three tokens designating a range)

    7. sc4312 Lecture 2b 7 DFA Implementation (Hand-Coded) A DFA can be implemented by hand-coding the states in the source code.

    8. sc4312 Lecture 2b 8 DFA Implementation (Table-Driven) A DFA can be implemented as a matrix of d.

    9. sc4312 Lecture 2b 9 Scanner Generator A scanner generator is a program that generates a source code for a scanner from a specification file in a target language. Compiler generators usually include a scanner generator (e.g., lex, flex) and a parser generator (e.g., yacc, bison). In this class, we will use a compiler generator, Coco/R, which generates both scanner and parser in C++, C#, Java and etc.

    10. sc4312 Lecture 2b 10 Overview of Coco/R Coco/R takes an attributed grammar of a source language as an input. Its output includes Scanner, Parser, and related classes. Other semantic classes (e.g., symbol table, code generator) will have to be hand-coded. The main program will create a Parser object and start the compilation process from there.

    11. sc4312 Lecture 2b 11 Generated Scanner The scanner generated by Coco/R is implemented as a DFA. Therefore, the lexical rules must be specified by an EBNF grammar. Tokens must be made up of characters from the extended ASCII set (256 values). The scanner can be made case-sensitive or case-insensitive.

    12. sc4312 Lecture 2b 12 EBNF in Coco/R = separates the sides of a production. . terminates a production. | separates alternatives. () groups alternatives. [] specifies an option. {} specifies an iteration (zero or more).

    13. sc4312 Lecture 2b 13 Structure of Coco/R Specification The Coco/R specification has the following structure: Cocol = [Imports] COMPILER ident [GlobalFieldsAndMethods] ScannerSpecification ParserSpecification END ident .

    14. sc4312 Lecture 2b 14 Scanner Specification The scanner specification consists of five optional parts: ScannerSpecification = [IGNORECASE] [CHARACTERS {SetDecl}] [TOKENS {TokenDecl}] [PRAGMAS {PragmaDecl}] {CommentDecl} {WhiteSpaceDecl}.

    15. sc4312 Lecture 2b 15 Character Sets Character sets are defined to be used in later sections. Examples: digit = 0123456789. hexDigit = digit + ABCDEF. letter = A .. Z. eol = \r. noDigit = ANY digit.

    16. sc4312 Lecture 2b 16 Tokens Tokens may be divided into two groups. Literals have a fixed representation in the source language (e.g., while, >=). Token classes have a certain structure that must be explicitly declared by a regular expression in EBNF.

    17. sc4312 Lecture 2b 17 Tokens Specification A token specification is as follow: TokenDecl = Symbol [= TokenExpr .]. TokenExpr = TokenTerm {| TokenTerm}. TokenTerm = TokenFactor {TokenFactor} [CONTEXT ( TokenExpr )]. TokenFactor = Symbol | ( TokenExpr ) | { TokenExpr } | [ TokenExpr ]. Symbol = ident | string | char.

    18. sc4312 Lecture 2b 18 Examples ident = letter {letter | digit | _}. number = digit {digit} | 0x hexDigit hexDigit hexDigit hexDigit. float = digit {digit} . {digit} [E [+|-] digit {digit}]. while = while. public = public.

    19. sc4312 Lecture 2b 19 Comments Comments in programming languages are usually hard to specify with regular expressions. Nested comments are even harder. Coco/R allows us to specify comments easily as follows: COMMENTS FROM // TO eol COMMENTS FROM (* TO *) NESTED

    20. sc4312 Lecture 2b 20 White Spaces White spaces are not relevant to a source program therefore they must be discarded by a scanner. The specification of white spaces in Coco/R is as follows: IGNORE \t + \r + \n

    21. sc4312 Lecture 2b 21 The Main Class of a Compiler public class Compiler { public static void Main(string[] arg) { Scanner scanner = new Scanner(arg[0]); Parser parser = new Parser(scanner); parser.Parse(); Console.WriteLine(parser.errors.count + errors detected.); } }

More Related