300 likes | 312 Views
Presentation on converting PASCAL source code to Microsoft intermediate language (MSIL), including lexical analyzer, symbol table design, and parser implementation for platform-independent execution.
E N D
DCSPM: Develop and Compile Subset of PASCAL Language to MSIL Master Project Abdullah Sheneamer MSCS Graduate Candidate FALL 2012 Abdullah Sheneamer Master project presentation
Outline Introduction to MSIL Related Works Why PASCAL to MSIL PASCAL Compiler Lexical Analyzer Design Symbol Table Design Parser and MSIL Design Improvements Evaluations Lesson Learned Future Work Conclusion Abdullah Sheneamer Master project presentation
Introduction to MSIL Microsoft intermediate language(MSIL) is the lowest-level human readable programming language defined by the Common Language Infrastructure (CLI) specification and .NET Framework (MSIL) includes instructions for loading, storing, initializing, and calling methods on objects, as well as instructions for arithmetic and logical operations. Abdullah Sheneamer Master project presentation
Related Works “The Design and Implementation of C-like Language Interpreter” [XX11] The authors presented a paper designs and implements a C-like language interpreter using C++ based on the idea of modularity. The function of lexical analyzer is to read character strings from the source program, split them into separate words, and constructs the internal expression of these words, that is, TOKEN. The basic idea of lexical analyzer design is: first, to judge the start and the end position of a word; second, to judge the attribute of a word. After a word is separated, the next thing is to determine its attribute “Simple Calculator Compiler Using Lex and YACC” [Upad11] The author presented a paper containing the details of how one can develop the simple compiler for procedural language using Lex (Lexical Analyzer Generator) and YACC (Yet Another Compiler-Compiler). Lex tool helps write programs whose control flow is directed by instances of regular expressions in the input stream. Abdullah Sheneamer Master project presentation
Why PASCAL to MSIL - Allow PASCAL to run on .NET platform - Study how compiler in .NET environment work - PASCAL can now be run on modern machines - MSIL is platform independent - JIT compilers can be optimized for specific machines and architectures Abdullah Sheneamer Master project presentation
PSCAL Compiler Compilation process: takes a PASCAL source code and produce (MSIL) Microsoft intermediate language. Execution process: MSIL must be converted to CPU-specific code, usually by a just-in-time(JIT) Compiler . Native code is computer programming (code) that is compiled to run with a particular processor (such as an Intel x86- class processor) and its set of instructions. Abdullah Sheneamer Master project presentation
Compilation Process Symbol Table PASCAL Source Code Lexical Analysis Error Handler MSIL Code Output Parser & MSIL Abdullah Sheneamer Master project presentation
Lexical Analyzer Design After reading next character from input stream ; State 0 : identify the current token and decide the next state ; State 1 : Handle identifiers and keywords. State 2: Handle Number . State 3 : Handle one – character token or two –character token . State 4,5 : Handle Comments “\\” or “\*”, skip the line start with “\\” or skip the data between “\*” and “*\”. Abdullah Sheneamer Master project presentation
Lexical Analyzer Design (cont.) Begin -/-1 lexbuf=“” 2- state=0; ID 1 WhiteSpace/ No Action Letter Or Digit /Place it in lexbuf Letter Or @ Or _/Place it in lexbuf INITIAL 0 Anything Else/ 1- return that last char into the input stream. 2- search the lexbuf in Symbol.3- insert it as ID if not found otherwise get the row number P. 4- build the token as: [code=sympol[p,token],[attr=p] 5. Enqueue the token and set lexbuf=“”. Letter Or @ Or _/Place it in lexbuf Anything Else/ 1- return that last char into the input stream. 2- Build the token as : [code: NUM, attr: value] 3. Enqueue the token and set lexbuf=“”. NUM 2 Digit/Place it in lexbuf Abdullah Sheneamer Master project presentation
Lexical Analyzer Design (cont.) INITIAL 0 Sequence is”*/”/ lexbuf=“”; state=0; Multiple line comment 5 New line/ lexbuf=“”; state=0; Single line comment 4 Unrelated Chararcter 1- Return last char into input stream. 2- Build the token: [ Code=ASCII(first char in lexbuf); attr=-1] 3- lexbuf=“”; state=0; 4- Return the token to the parser. Anything else/Place it in lexbuf Anything else/Place it in lexbuf Sequence is”//”/ state=4; Other character: 1- Place it in lexbuf. 2- Get the code for the two charcter token in lexbuf. 3- Build the token:[code = obtained code; attr=-1]. 4- lexbuf=“”; state=0. 5- Return the token to the parser One or Two Char 3 Sequence is”/*”/ lexbuf=“”; state=5; 4/10/2012 Abdullah Sheneamer Master project presentation Abdullah Sheneamer Master project
Symbol Table Design • Every key word is a token and has a unique integer code • The identifier token has a code 256 • The number token has a code 257 • For every special character is a token and has an integer token code equals its ASCII number. Tokens of two characters have unique to Codes Abdullah Sheneamer Master project presentation
Parser and MSIL Design The parser is used the most of PASCAL Grammar BNF [22] Such as nested if/else and if logic expression statement. Abdullah Sheneamer Master project presentation
Parser and MSIL Design (Cont.) Abdullah Sheneamer Master project presentation
Parser and MSIL Design (Cont.) Abdullah Sheneamer Master project presentation
Improvements Two Improvements in DCSPM Compiler: 1- Lexical Analysis Improvement • Array List • Dictionary Abdullah Sheneamer Master project presentation
Improvements (Cont.) 2- MSIL Code Output Improvement Simple Pascal Code: begin a:=0; b:=1; c:=2; if( a== 0) then begin a:= b+c; end; end; end. IL_0000: ldc.i4.0 IL_0001: stloc.0 IL_0002: ldc.i4.1 IL_0003: stloc.1 IL_0004: ldc.i4.2 IL_0005: stloc.2 IL_0006: ldloc.0 IL_0007: ldc.i4.1 IL_0008: ceq IL_000a: stloc.3 IL_000b: ldloc.3 IL_000c: brfalse.s IL_0012 IL_000e: ldloc.1 IL_000f: ldloc.2 IL_0010: add IL_0011: stloc.0 IL_0012: ret IL_0000: ldc.i4.0 IL_0001: stloc.0 IL_0002: ldc.i4.1 IL_0003: stloc.1 IL_0004: ldc.i4.2 IL_0005: stloc.2 IL_0006: ldloc.0 IL_0007: ldc.i4.1 IL_0008: ceq IL_000a: ldc.i4.0 IL_000b: ceq IL_000d: stloc.3 IL_000e: ldloc.3 IL_000f: brtrue.s IL_0015 IL_0011: ldloc.1 IL_0012: ldloc.2 IL_0013: add IL_0014: stloc.0 IL_0015: ret Abdullah Sheneamer Master project presentation
Evaluations 1- Array list data structure vs. Dictionary data structure Abdullah Sheneamer Master project presentation
Evaluations (cont.) Complexity of Array list vs. Dictionary Abdullah Sheneamer Master project presentation
Evaluations (cont.) 2- Parser phase test Abdullah Sheneamer Master project presentation
Evaluations (cont.) 3- Initial and Improved nested If/else MSIL Code Abdullah Sheneamer Master project presentation
Evaluations (cont.) Size of Initial and Improved nested if/else MSIL Code Abdullah Sheneamer Master project presentation
Lessons Learned • ildasm.EXE: Converts IL to human readable code tool C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin • ILASM.EXE: Converts human readable code to IL tool C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322 Or C:\Windows\Microsoft.NET\Framework\v2.0.50727 • Date Time and Time Span DateTime Start = DateTime.Now; lex(); TimeSpan Elapsed = DateTime.Now- Start; speed = "Time Elapsed of Lexical Analysis: " + Elapsed.TotalMilliseconds + "ms"; • Stopwach class System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch(); Stopwatch stopwatch = new Stopwatch(); Stopwatch.Start(); lex(); stopwatch.Stop(); speed = "Time Elapsed of Lexical Analysis: " + Elapsed.TotalMilliseconds + "ms"; Abdullah Sheneamer Master project presentation
Lessons Learned (cont.) Nested if/else logic statement Abdullah Sheneamer Master project presentation
Future Works Many statements and data structures of Pascal language are yet to be supported and related MSIL generated: 1- complicated case statement. 2- if logic of a complex condition with multiple levels 3- assert statement 4- exit statement 5- goto statement 6- repeat statement 7- next statement 8- complicated one dimensional array, 9- two dimensional array data structure 10- queue data structure 11- stack data structure Abdullah Sheneamer Master project presentation
Conclusion The DCSPM compiler is useful to legacy Pascal to run on modern machines and its MSIL is a platform independent. MSIL code is verified for safety during runtime and MSIL can be executed in any environment supporting the CLI (Common Language Infrastructure). One dimensional array has two cases when compiling to MSIL. First, when the array has one element or 2 elements will be the same looks like the MSIL of other statements ( if/else/while….etc) The initial lexical analysis is using array list data structure in symbol table and the improved lexical analysis which is using a dictionary data structure in symbol table too. So, when I had tested the two situations by Stopwatch class. Abdullah Sheneamer Master project presentation
Conclusion (cont.) A batch timer.cmd file to calculate time of MSIL results. Improved nested if/else statement faster than initial nested if/else statement, although both of them have the same results. The experiences learned in this project can serve as a foundation for developing new programming language. Abdullah Sheneamer Master project presentation
Demo & Questions http://cs.uccs.edu/~gsc/pub/master/asheneam/src/COMPILER/bin/Debug/ Abdullah Sheneamer Master project presentation
Bibliography • [MC5tk]: http://msdn.microsoft.com/en-us/library/c5tkafs1(v=vs.71).aspx • [XX11]: Xiaohong Xiao and You Xu “The Design and Implementation of C-like Language • Interpreter” Proceedings of2nd International Symposium onIntelligence Information Processing and Trusted Computing (IPTC), pp. 104-107, 2011 • [Upad11]: MohitUpadhyaya “Simple Calculator Compiler Using Lex and YACC” Proceedings of 3rd IEEE Interenational Conference on Elecronic Computer Technology (ICECT), Vol. 6, pp. 182-187, 8-10 April 2011 • [DLNYM]: C# To Program By H.M Deitel & P.J.Deitel& J.Listfield & T.R. Nieto & C.Yaeger & M.Zlatkina. • [L97]: Compiler Construction principles and practice by KennthC.louden • [MN11]: Data Structure using Java By D.S.Malik & P.S.Nair. • [L06]: An introduction to formal languages and automata. Fourth Edition. Peter Linz • [ASU11]: Compilers Principles, Techniques and Tools (2nd Edition) Alfred V. Aho, Monica S. Lam , Ravi Sethi, Jeffrey D. Ullman • [AL09]: Develop a Compiler in Java for a Compiler Design Course Abdul Sattar and TorbenLorenzen • [Assembly11]: Guide to assembly language [electronic resource] : a concise introduction / James T. Streib.Streib, James T. London ; New York : Springer, c2011. • [WFRBE89-90]: Using a Stack Assembler Language in a Compiler Course by Dr. Gerald Wildenberg St . John Fisher College, Rochester, NY Bristol Polytechnic, England (1989-1990 ) Abdullah Sheneamer Master project presentation
Bibliography (cont.) [ LS56]: Expert .NET 2. IL assembler/ Serge Lidin. Lidin, Serge. 1956- Berkeley, CA [CodeProject]: http://www.codeproject.com/Articles/3778/Introduction-to-IL-Assembly-Language [MHt8e]: http://msdn.microsoft.com/en-us/library/ht8ecch6(v=vs.71) [ K08]:Pro C# 2008 and the .NET 3.5 Platform, Fourth Edition [ CodeMSIL]: http://www.codeguru.com/csharp/.net/net_general/il/article.php/c4635/MSIL-Tutorial.htm [WikiPascal]: http://en.wikipedia.org/wiki/Pascal_(programming_language) [PagesCs]: http://pages.cs.wisc.edu/~fischer/cs536.s08/lectures/Lecture02.4up.pdf [MArraylist]: http://msdn.microsoft.com/en-us/library/system.collections.arraylist.aspx [MKx37]:http://msdn.microsoft.com/en-us/library/kx37x362.aspx [WikiExpr]:http://en.wikipedia.org/wiki/Microsoft_Visual_Studio_Express#Visual_C.23_Express [DllAssem]: http://dll-repair-tools.com/dll-files/fusiondll-the-assembly-manager [learnExp]:http://www.learnvisualstudio.net/start-here/lesson-1-1-installing-visual-c-2010-express-edition/) [SeasPascal]: http://www.seas.gwu.edu/~hchoi/teaching/cs160d/pascal.pdf [GeekClass]:http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx [DotArray]: http://www.dotnetperls.com/arraylist [Ecma]: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf Abdullah Sheneamer Master project presentation