50 likes | 157 Views
Decompilation of .NET bytecode. Stephen Horne Trinity Hall. Computer Science Part II Project Progress Report. http://hal.trinhall.cam.ac.uk/~srh38/project. 10 th February 2004. The .NET framework. .NET and the Common Language Runtime Microsoft’s answer to Java
E N D
Decompilation of .NET bytecode Stephen Horne Trinity Hall Computer Science Part II Project Progress Report http://hal.trinhall.cam.ac.uk/~srh38/project 10th February 2004
The .NET framework • .NET and the Common Language Runtime • Microsoft’s answer to Java • CLR is .NET equivalent of the JVM • Lots of useful metadata provided in assemblies C# C# compiler J# J# compiler Common Language Runtime CIL and Metadata Managed C++ Managed C++ compiler VB .NET VB .NET compiler • What about reversing the compilation process? • Sometimes we want to recover source from a binary • Language translation • Lost source recovery • Checking for malicious code • Obvious legal and ethical ramifications Slide 2
Structure of a decompiler Executable Front end • Reads in bytecode • Divides into basic blocks Low-level intermediate code Unstructured control-flow graph UDM Decompiler • Data-flow analysis • Control-flow analysis Structured control-flow graph High-level intermediate code Source Back end • Code generation Slide 3
Example decompilation CIL bytecode Control-flow graph Process IL_0000: ldc.i4.0 IL_0001: stloc.0 IL_0002: ldc.i4.0 IL_0003: stloc.1 IL_0004: br.s IL_0023 Entry 1 • Divide code into basic blocks and create CFG • Data-flow analysis • Register copy propogation • Control-flow analysis • Divide graph into intervals • Loops induced by back-edges within intervals • Nesting of intervals nesting of loops • Conditionals found by common follow nodes • Order of nodes nesting of conditionals • Generate code from structured CFG 1 IL_0006: ldc.i4.3 IL_0007: ldloc.1 IL_0008: mul IL_0009: ldarg.0 IL_000a: bge.s IL_0012 3 2 IL_000c: ldloc.0 IL_000d: ldc.i4.1 IL_000e: sub IL_000f: stloc.0 IL_0010: br.s IL_0016 4 8 3 IL_0012: ldloc.0 IL_0013: ldc.i4.1 IL_0014: add IL_0015: stloc.0 5 9 4 5 IL_0016: ldloc.0 IL_0017: call Math::Abs(int32) IL_001c: ldloc.1 IL_001d: blt.s IL_0006 6 Exit 6 IL_001f: ldloc.1 IL_0020: ldc.i4.1 IL_0021: add IL_0022: stloc.1 7 7 IL_0023: ldloc.1 IL_0024: ldarg.0 IL_0025: blt.s IL_0006 2 IL_0027: ldloc.0 IL_0028: stloc.2 IL_0029: br.s IL_002b 8 9 IL_002b: ldloc.2 IL_002c: ret Slide 4
Current status Original • Features implemented: • Analysis for basic conditional and looping structures • Control flow graph generation • C# code generation • Almost half the CIL instruction set • Decompiles very basic applications • Remaining tasks (lots!): • Local variable names • Basic language features (arrays, switching, breaks etc.) • Advanced features (custom indexers, operator overloading, properties) • Object oriented features • Extensions: • Decompilation for other stack-based architectures (e.g. Java) • Code generation for other languages (e.g VB .NET) • Graphical user interface public static int ControlExample(int x) { int y = 0; for(int i = 0; i < x; i++) { do { if(3 * i < x) y--; else y++; } while(Math.Abs(y) < i); } return y; } Decompiled public static Int32 ControlExample(Int32 x) { Int32 local0; Int32 local1; Int32 local2; local0 = 0; local1 = 0; while (local1 < x) { do { if (((3 * local1) < x)) { local0 = (local0 - 1); } else { local0 = (local0 + 1); } } while (Math.Abs(local0) < local1); local1 = (local1 + 1); } local2 = local0; return local2; } Slide 5