160 likes | 171 Views
Join Dr. Hussien Sharaf and Dr. Mohammad Nassef from Cairo University on a journey to uncover the data structures of compilers, covering topics like Lexical Analysis, Symbol Tables, Parse Trees, and more. Explore how compilers utilize data structures for efficient code generation.
E N D
Cairo University • FCI Compilers CS419 Lecture3: Data Structures of a compiler Dr. HussienSharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University Welcome to a journey to
Data Structures of Compilers • Lexical Analysis • Languages and Regular Expressions (REs) • Examples • REs in C# • Appendix for revision Agenda
1. Token 2. Symbol table 3. Literal table 4. Parse tree 5. Semantic parse tree 6. Intermediate code Some Data Structures 4
Single Symbol ahead: In most languages the scanner needs to generate only one token ahead at a time. In this case you don’t need a collection/array of tokens, only one global variable can be used. 1. Token
Stores information associated with identifiers. Information associated with variables like [name, type, address, size (for array), etc.] Stores Information associated with functionslike [name, type of return value, parameters, address, etc.] 2. Symbol Table • Sample code: • int x, y; • char c[10]; • x = 5; 6
User defined data types like structs, enums and classes. The symbol table is modified by the scanner, parser, and semantic analyzer. The information at the symbol table is used by intermediate code generator phase and machine code generator phase. Mostly use hash table for efficiency.Because access time is O(k) and space consumption is not a concern. 2. Symbol Table (cont’d) 7
Store constants and strings used in program • reduce the memory size by reusing constants and strings • Can be combined with symbol tablein some implementations. 3. Literal table 8
Dynamically-allocated, pointer-based TREE structure Sample tree 4. Parse tree 9
Usually the same parse tree is used and annotations are added for each node. 5. Semantic parse tree 10
The structure of the code is kept as simple as possible usually three-address code. Each instruction is allows only three addresses (variables). Each instruction is added as an entry into a linked list that allows dynamic growth. 6. Intermediate code Var1 Var2 op Var3 Var1 Var2 op Var3 …. …. op … NULL 11
A scanner reads a stream of characters and puts them together into some meaningful (with respect to the source language) units called tokens. • It produces a stream of tokens for the next phase of compiler. Scanning Stream of characters Stream of tokens scanner 13
Lexical Analysis (Scanning): is the task of reading a stream of input characters and dividing them into tokens (words) that belong to a certain language. What is Lexical Analysis a Token1 [ Token2 index .. ] .. = .. 4 .. + .. .. ; .. Stream a[index]=4+2; Scanner • Responsibility: accepts and splits a stream into tokens according to rules defined by the source code language. 14
For the code fragment below, choose the correct number of tokens in each class that appear in the code fragment: Example1: • Ws? • Ks? • Is? • Ns? • Os? • 11 • 2 • 4 • 2 • 7
For the code fragment below, choose the correct number of tokens in each class that appear in the code fragment: Example2: • Ws? • Ks? • Is? • Ns? • Os? • 9 • 1 • 3 • 2 • 9