1 / 23

Compiler design

Compiler design. Lecture 2: lexical analysis Lecturer :Batool Almasri. Lexical Analysis. -It’s the first phase of compiler is also known as Scanner -Lexical breaks the input in to smallest meaningful sequences called TOKENS which Parser uses for syntax analyzer .

kyzer
Download Presentation

Compiler design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler design Lecture 2: lexical analysis Lecturer :Batool Almasri

  2. Lexical Analysis -It’s the first phase of compiler is also known as Scanner -Lexical breaks the input in to smallest meaningful sequences called TOKENS which Parser uses for syntax analyzer . -Some token can be define as IDENTIFIERS, KEYWORDS, OPERATORS, PUNCTUATION MARKS….. • IT remove • 1- white space ( like Tab, Blank, New line ) • 2- and comments.

  3. Lexical analysis -the part of input stream that qualifies for token is called LEXEME. -EXAMPLE: If QUALIFIES for key word in C language hence “IF” is lexeme in this case. • -Lexical analyzer keep track of new line character so that it can give the line number in case of any errors in the source program.

  4. Lexical analysis • The goal of lexical analysis is to divide the program text into its words or what we call in compiler speak , the TOKENS. • Eg: IF (x==y) then printf (“hello %D ,A); ELSE print (“%d” , b); • - token class corresponds to set of strings: • - IDENTIFIER : String of letter and digits starting with a letter • -INTEGER : a non empty string of digits • -KEYWORD: “else” or “if” or “begin” or …….. • - whitespace: a non empty sequence of blanks, newlines , and tabs

  5. - It reads character streams from the source code , checks for legal tokens and passes the data to the syntax analyzer when it demands.

  6. Lexical analysis • Lexical analyser classify program substrings according to role (token class) and communicate tokens to the parser. • IF (i==j) • Z=0; • Else • Z=1; • Lexical analyzer reads its as: • \tif (i==j)\N\T\TZ=0;\N\TELSE\N\T\TZ=1; • \T :TAB (WHITE SPACES \N :NEWLINE • -LEXICAL ANALYSER REMOVE S ALL THE WHITE SPACES AND BLANKS AND BLANKS CHRACTER FROM THE SOURCE CODE.

  7. Regular language • To define the regular languages we generally use something called regular expressions and each regular expression is a set. • -there are two regular expressions: • 1-Single character ‘C’ = { “C”} • That’s an expression and what notes is a language containing one string. • 2-Epsilon ={} • That contain again just a single string ,this time the empty string , and one thing that’s important to keep in mind is that epsilon is not an empty language.

  8. Regular language • Three compound regular expressions : • 1-Union : A+B also written in lex as A|B means either A or B . • 2- Concatenation :AB • MEANS A AND B • 3-Iteration *: A* this is kleene star or closure and a star equal union • -A+ =AA* • E.G: (a)* means a can occur 0 or more times • E.G: (a+) = means a can occur 1 or more times minimum 1 time a should come.

  9. Representing symbols using regular expressions • 1- letter = [A- Z] • 2-DIGIT = 0|1|2|3|4|5|6|7|8|9 • 3- SIGN = + | -

  10. EXAMPLE OF REGULAR EXPRESSINS

  11. HOW TO RECIGNIZE TOKEN? • FINITE STATE AUTIMATA

  12. FINITE AUTOMATA • THE HEART of the transition of lex turning input program to lexical analyzer is finite automata. • have : • 1- one start state. • 2- many final state • 3- each state is labeled with a state name • 4-directed edge labeled with symbols • Two types : • -Non deterministic • -Deterministic

  13. Finite state automata

  14. Deterministic Finite Automata (DFAs)

  15. Non-Deterministic Finite Automata (NFAs)

  16. Example of regular expression

  17. What is a regular exepressios?

More Related