1 / 22

SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES

Learn about the importance of symbol tables in compilers that produce executables and their use in code generation. Explore hash-based symbol table implementation and code generation for addition expressions.

mclelland
Download Presentation

SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES The next step in what we didn’t cover in the course …

  2. SYMBOL TABLES Compilers that produce an executable (or the representation of an executable in object module format) as opposed to a program in an intermediate language (and, in fact, for optimization purposes, all compilers) need to make use of a symbol table

  3. The symbol table records information about the identifiers in the source program such as their name, type, no. of dimensions, space assignment, etc.

  4. To illustrate the use of symbol tables, let’s consider a simple compiler, where symbol_stack consists of integers, and the integer associated with an identifier on the stack is the index of the entry for that identifier in the symbol table.

  5. Our symbol stack entries will provide the name of the identifier and the offset assigned to it in the data segment. • Negative numbers will be employed on symbol stack as codes to denote the registers, AX, BX, etc.

  6. As identifiers are encountered in the source code, their names are packed onto an array, we will call id_stack, defined as: char id_stack[1000]; • Since strings in C all end in a 00h byte, it is only necessary to specify where on id_stack a name begins, in order to retrieve it.

  7. The symbol table entry for a name does not contain the name itself, but instead a pointer to the beginning of the name on id_stack. • The reason for this is that, since the symbol table is an array of symbol table entries, we would have otherwise have to provide space in each entry for the largest legal name size.

  8. When an identifier is encountered in the source code, the compiler has to search the symbol table to find the entry, if any, for it. • Various methods have been investigated for making this process more efficient, such as the use of binary trees,

  9. But the method of choice has been to derive a number called a hash code from an identifier, and then link all identifiers with the same hash code in a list, which we will refer to as a hashlist

  10. One method for evaluating a hash code, is to add up the ascii codes of the individual characters of the identifier • and then take, as the hash code the remainder of this sum after division by a prime number, such as 127.

  11. The following is sample code for this purpose: int hash(char * name) { int hash_value = 0; int i = 0; while(name[i] != '\0') { hash_value += name[i]; ++i; } return(hash_value % 127); } In this scheme there are 127 hash-lists

  12. A simple symbol table could be defined as follows: typedef struct { int name_index; int offset; int hash_link; } symbol_table_entry; symbol_table_entry symbol_table[1000];

  13. Here name_index is the pointer into ID_S where the name is stored, • offset is the offset in the data segment assigned to the identifier, and • hash_link is a pointer to the symbol table entry for the next identifier encountered, if any, with the same hash code

  14. The entries at symbol_table[0] thru symbol_table[126] are reserved for the heads of the 127 hash-lists.

  15. For example if X1 is the first identifier encountered in the source with hash-code (say) 30, then an entry for it will be made at symbol_table[30]. • If later on, an identifier ZZ is encountered which also has hash-code 30, then an entry will be made for ZZ at the next free index > 127 in symbol_table, and the hash-link in the entry for X1 will be changed from null to point instead to the entry for ZZ.

  16. Within the rules section of the Lex definition file, the regular expression and associated code for an identifier may take a form such as the following: {letter}({letter}|{digit}|'_')* {yylval= find(yytext); return dentifier;} where the find function returns the index into the symbol_table of the entry for the identifier, creating an entry if one doesn’t already exist

  17. The find function begins as follows: int find(char * name) { int j; j = hash(name); and proceeds according to the flow-diagram on the next slide

  18. Code Generation Using the Symbol Table Let’s consider the code required in our simple compiler within our Yacc definition file for addition. To avoid complications, let’s assume that the code for our arithmetic expressions requires the use of register AX only

  19. So on symbol stack, positive numbers are indexes of entries for identifiers in symbol_table, and (say) -1 is used as a code for AX: expression : expression ‘+’ term { c code as described below} The c code should check whether $1 and $3 are positive or negative, and generate appropriate object code for each of the 4 cases.

  20. Case where $1 and $3 are both positive: Generate machine code corresponding to: mov AX, symbol_table[$1].offset; add AX, symbol_table[$3].offset; 21and set $$ = -1

  21. Case where $1 is neg. and $3 is positive: Generate machine code corresponding to: add AX, symbol_table[$3].offset; and set $$ = -1

More Related