290 likes | 1.68k Views
System Software Unit-1 (Language Processors) A TOY Compiler. Prepared By :- Bhavin Dalsaniya MEFGI-MCA Studant. The Front End. The front end performs Lexical Analysis Syntax Analysis Semantic Analysis of the source program. Each kind of analysis involves the following functions
E N D
System SoftwareUnit-1 (Language Processors)A TOY Compiler Prepared By :- BhavinDalsaniya MEFGI-MCA Studant
The Front End • The front end performs • Lexical Analysis • Syntax Analysis • Semantic Analysis of the source program. • Each kind of analysis involves the following functions • Determine validity of a source statement • Determine the ‘content’ of a source statement • Construct the IC of a source statement for use by subsequent analysis functions.
‘content’ word • The word ‘content’ has different meaning in laxical,syntax and semantic analysis. • In lexical analysis, the content is the lexical class to which each lexical unit belongs. • In syntax analysis it is the syntactic structure of a source statement. • In semantic analysis the content is the meaning of a statement.
After Analysis of ‘content’ • It generates information in form of • Tables of information • Description of the source statement • Subsequent analysis uses this information for its own purpose and either adds information to these tables and description. • For example :- syntax analysis uses the information generated by lexical analysis and construct a representation for the syntactic structure of source statement . • Semantic analysis uses the information generated by syntax analysis and construct representation for the meaning of the statement. • The tables and descriptions at the end of semantic analysis form the IR (Intermediate Representation) of the front end. • Its more clear from the following diagram.
Diagram of front end toy compiler Source Program ------------------------------ ||||||||||||| |||||||||||||| Lexical Or Scanning Lexical Errors Symbol table, Constant table, Other tables… Tokens Syntax OR Parsing Syntax Errors Trees Semantic Analysis Semantic Errors ------------------------------ IC IR
1.Lexical Analysis(Scanning) • Lexical analysis identifies the lexical units in a source statement. • It then classifies the units into different lexical classes. • E.g. id’s,constants,reserved id’s etc and enters them into different tables • Lexical analysis builds a descriptor, called a token, for each lexical unit. • A token contains two fields—class code and number in class. • Class code identifies the class to which a lexical unit belongs. • Number in class the entry number of the lexical unit in the relevant table. • We depict a token as Code #no
Example :- • i : integer • a,b : real • The statement a:=b+i; • Symbol Table Intermediate Code 1. Convert (Id,#1) to real • ,giving (Id,#4) • 2. Add (Id,#4) to (Id,#3), • giving (Id,#5) • 3. Store (Id,#5) in (Id,#2) Id,#2 Op,#5 Id,#3 Op,#3 Id,#1 Op,#10
2.Syntax Analysis(Parsing) • Syntax analysis processes the string of tokens built by lexical analysis to determine the statement class, e.g. assignment statement, if statement , etc. • It then builds an IC which represents the structure of the statement. • The IC is passed to semantic analysis to determine the meaning of the statement • A tree form is chosen for IC because a tree can represent the hierarchical structure of a PL statement appropriately. a:= b+i; := real a + a b i b
3.Semantic Analysis • Semantic analysis identifies the sequence of actions necessary to implement the meaning of a source statement • When semantic analysis determines the meaning of a subtree in the IC,it adds information to a table or adds an action to the sequence of actions. • It then modifies the IC to enable further semantic analysis. • The analysis ends when the tree has been completely processed.
Example of Semantic Analysis := A) • Source statement a:=b+i; • No of Analysis Steps :- • Add type • Right hand side Expression evaluated first in assignment. • Before Add , perform Conversion int to real • Addition operation and store into temp. • temp store into a. • Its more clear from the tree shown in front. a, real + i, int b, real := B) a, real + i*, real b, real := C) a, real temp, real
* The Back End • The back end performs two task as follows • Memory Allocation • Code generation • Memory Allocation :-memory allocation is a simple task given the presence of the symbol table. • The memory requirement of an identifier is computed from its type, length an dimensionality and memory is allocated to it. • The address of the memory area is entered in the symbol table.
Conti… • Code Generation :- code generation uses knowledge of the target architecture.. • Knowledge of instruction and addressing modes in the target computer, to select the appropriate instruction. • The important issues in code generation are : • Determine the places where the intermediate results should be kept. either it is in memory location or in machine register. • Determine which instructions should be used for type conversion operation. • Determine which addressing modes should be used for accessing variables.
Programming Language Grammar • A language L can be considered to be a collection of valid sentences. • Each sentences can be looked upon as a sequence of words , and each word as a sequence of letters or graphic symbols acceptable in L. • A Language specified in this manner is known as a “Formal Language”. • Terminal Symbol :- • The alphabet of L, denoted by the Greek symbol ∑, is a collection of symbol in its character set. • We will use lower case letters a , b , c , etc. to denote symbols in ∑. • A symbol in the alphabet is known as a terminal symbol (T) of L. • The alphabet can be represented using the mathematical notation of a set , e . g • ∑={ a ,b , c …..z,0,1,2 …9}
Conti… • Here the symbol {, ‘,’ and} are part of the notation . we call them metasymbols to differentiate them from the terminal symbols. • Strings :- • A string is a finite sequence of symbols . we will represent strings by Greek symbols αβγ etc. • α= axy is a string over ∑. • The length of a string is the number of symbols in it. • Note that absence of any symbol is also a string, the null string €. • Concatenation operation combines two strings into single strings.
Conti… • Nonterminal symbols :- • A nonterminal symbol (NT) is the name of a syntax category of a language. E.g noun, verb etc. • An NT is written as a single capital letter or as a name enclosed between <….>, e.g A or <Noun>. • During grammatical analysis, a nonterminal symbol represents an instance of the category . thus,<Noun> represents a noun. • Productions :- • A production also called a rewriting rule, is a rule of grammar. • A production has the form • A nonterminal symbol::= String of Ts and NTs
Conti… • Each grammar G defines a language Lg . G contains an NT called the distinguished symbol or start NT of G . unless otherwise specified, we use the symbol S as the distinguished symbol of G. • A valid string α of Lg is obtained by using the following procedure • Let α=‘S’ • While α is not a string of terminal symbols • Select an NT appearing in α,say X • Replace X by a string appearing on the RHS of a production of X. • Grammar • Derivation • Reduction • Parse Tree