Cross-language Program Slicing in the .NET Framework

Cross-language Program Slicing in the .NET Framework Krisztián Pócza, Mihály Biczó, Zoltán Porkoláb Eötvös Loránd University, Hungary Faculty of Informatics Department of Programming Languages and Compilers

The structure of this presentation • Motivation • Theoretical introduction • Earlier work • Slicing - .NET Framework • Technical outlook • Architecture & algorithm • Practical experiences • Summary

Motivation • What does the term ‘program slicing’ really cover? • Mapping mental abstractions of programmers • Easing program maintenance tasks (debugging) • There is a need to integrate slicing into modern debuggers • Real world applications are composed of several modules written in different languages

Theoretical introduction • Slicing means finding all those statements that might directly or indirectly affect the values of variables in a set V • Depends on the program location • The criterion that defines the slicing problem is a pair C=(p,V) where p denotes program location • The criterion is the slicing criterion • In the classical case…

Theoretical introduction • Static vs. dynamic slicing – what’s the difference? • Input of the program was disregarded in the previous (static) case • If input is considered we talk about dynamic slicing • Dynamic slicing criterion is a triple C=(I,o,V) where I is program input, o is occurance of a statement • Occurance

Theoretical introduction • Static vs. dynamic slicing – example:

Theoretical introduction START 1 • Example program and its control flow graph • CFG is the basis of more advanced concepts 1 sum = 0 2 mul = 1 3 a = 1 4 b = read() 5 while (a <= b) { 6 sum = sum + a 7 mul = mul * a 8 a = a + 1 9 } 10 write(sum) 11 write(mul) 2 3 4 6 T • Remarks: • Intuitive representation • Control dependence • Data dependence 5 7 F 10 8 11 STOP

Theoretical introduction • Post-dominator (m post-dominates n) • m,n are nodes in CFG • Any path from n to STOP node goes through m • Control dependence • there exists a path p from n to m in the CFG • m is a post-dominator for every node in p except n, and • m is not a post-dominator for n • Data dependence • there is a path p from n to m in the CFG, • there is a variable v, with vdef(n) and v  ref(m), • for all nodes kn of path p, vdef(k) holds • Program Dependence Graph (PDG) • System Dependece Graph (SDG)

Earlier work • Dynamic slicing • Different input, different execution branches • Different input, different dynamic slice (see dynamic slicing criterion) • How to track execution paths? • Generating call trace • Needs running program against specified input values • Log execution path in a comfortable format (typically plain text)

Earlier work • Dynamic slicing • Studied previously through real world applications by the JAVA community • Java Platform Debugging Architecture (JPDA) • Java Virtual Machine Debug Interface (JVMDI) • Java Debug Wire Protocol (JDWP) • Java Debug Interface (JDI) • Only since JDK 1.3! • Custom solutions (JVM hacking)

BCL Base Class Library Common Type System - Common Language Spec. CTS (CLS) CLR Slicing - .NET Framework • Architecture of the .NET Framework One library for all langs. Subset of CTS Managed code lattice Common Language Runtime

Slicing - .NET Framework • Key concept of the .NET framework: language interoperability • Cross-language debugger • Cross-language program slicers • identify bugs more precisely and at a much earlier stage Software quality .NET debugging capabilities Program slicing improves simplifies

Technical outlook • Earlier active scripting • Now script engines compile and interpret code for CLR • .NET Debugging Services API • Debug every code compiled to IL • Debugging capabilities for all modern languages

Technical outlook • CLR supports two debugging modes: • In-process • Inspecting the run-time state of an application • Collecting profiling information • Out-of-process • run in a separated process • Providing common debugger functionality like stepping, breakpoints, etc. • The CLR Debugging Services is implemented as a set of 70+ COM interfaces

Symbol Manager Design time Profiler CLR Publisher Technical outlook • CLR Debugging Architecture

Technical outlook • Design-time interface • Responsible for handling debugging events • Implemented separated from the CLR • Host application resides in a different process • Has a separate thread for receiving debugger events • When a debug event occurs (assembly loaded, thread started, breakpoint reached, etc.) the application halts and the debugger thread notifies the debugging service through callback functions

Technical outlook • Symbol manager • Interprets the program database (PDB) files • PDB files contain debugging information • Enables the unique identification of program elements like classes, functions, variables and statements • Program database can also be used to retrieve their original position in source code

Technical outlook • Publisher • Enumerates all running managed processes in the system • Profiler • Tracks application performance and resources used by running managed processes

Architecture & algorithm Phase 1 Phase 2 Source code Beautification Recompile in Debug mode Cross-language slice Dynamic slicing algorithm Generate Call Trace Call trace

Architecture & algorithm • Phase 1 produces the input for Phase 2 • Phase 1 steps: • Source code beautification: • Parsing code: Marcel Debreuil’s library using ANTLR • Writing back to a custom alignment: sequence point = line • Recompile in debug mode: • csc /debug+ … • vbc /debug+ …

Architecture & algorithm • Phase 1 further steps: • Generate call trace: • Using .NET Debugging Services • Find Entry Point • Place breakpoint • Call Step/Step In operation until end • The Stepper is derived from MDbg’s source • Call Trace: • Not step, output of Ph1 and input of Ph2

Architecture & algorithm • Phase 2 steps: • Dynamic slicing algorithm • Input: call trace and beautified source code • Output: Cross-language slice • Language independent/platform independent • No .NET specific features • Cross-language slice: • Not step, output

Usage of Debugger • Generates output like: module load breakpoint hit st MainClass.cs 10 M st MainClass.cs 11 M st MainClass.cs 12 M st MainClass.cs 13 M st MainClass.cs 14 M module load st MainClass.cs 20 M,R st MainClass.cs 22 M,R st Functions.cs 10 M,R,A st Functions.cs 11 M,R,A st MainClass.cs 23 M,R st Functions.cs 15 M,R,P st Functions.cs 16 M,R,P … • Demo program

Basics of Dynamic Slicing • Call trace+source code → dynslice • Intra-procedural/inter-procedural • Example program: • Define and reference a variable 1 int n = askUser(); 2 int i = 0; 3 int sum = 0; 4 int prod = 1; 5 while (i < n) 6 { 7 sum += i; 8 prod *= i; 9 i++; 10 } 11 Console.WriteLine(sum);

Start 1 2 3 4 5 11 7 8 9 Basics of Dynamic Slicing • Control Dependence Graph (not CFG) • Control Dependence Edges

Action and Variable Store • Action • Value can be Def or Ref • Always store action belonging to a variable • Variable Store (VarStore) • (variable, Action) pairs • Action means the last action on variable • Method-wide • Dynamically updating while dynslicing

Intra-procedural operation • Backward algorithm • First items in VarStore: (sliceVar, Ref) • When encountering a statement: • Variable with Ref Action is defined in the statement: • Statement added to slice • In Varstore: Ref -> Def • Referenced variables are added to VarStore with Ref Action • Variable with Def Action is redefined: • Nothing to do • Would be killed

Intra-procedural operation • When adding a statement: • Add statements to LoopCond the current statement is control dependent on • Condition or loop test statement when a statement encountered in its body • When a statement is encountered: • Always check if it is in LoopCond • If yes: • Add referenced vars to VarStore • Increase slice • Add parents to LoopCond

Intra-procedural operation • Remember the example program? • Call trace of example program : 1,2,3,4(,5,7,8,9){n},5,11 • Slicing criterion: (<n=2>, 111, {sum}) • Example run on next slide 1 int n = askUser(); 2 int i = 0; 3 int sum = 0; 4 int prod = 1; 5 while (i < n) 6 { 7 sum += i; 8 prod *= i; 9 i++; 10 } 11 Console.WriteLine(sum);

Intra-procedural operation

Extend to inter-procedural • Starts in the same way as intra-procedural • What happens when the last line of a function reached (backward)? • New VarStore, new LoopCond • Have to maintain them until function start • Context • Indexing data structure

Extend to inter-procedural • At the last line of a function: • Identify calling line • Identify all output parameters • Select those have RefAction in VarStore • If nothing → disregard function • New Context + Recursive Call of DynSlice • When reached calling line: • Identify used input parameters • Update current Context (VarStore and LoopCond)

Creation of Indexing Data Structure • Used at identification of calling line • Would be slow to go through call trace at every call’s line end • Unique ID is given for every unique function call (run) in the call trace • Do not have to be continuous (1,1,1,1,1,2,2,3,3,2,4,4,2,2,5,… ) • While building structure store these runs (using a Hashtable) • At every start store the previous end • The query of function calling line is a single operation

Practical experiences

Summary • Language features studied: • Value types • Basic program constructions • Static method calls • Language features to be studied: • Reference types • Non-static method calls • Delegates, properties, foreach, lock, using • Generics, anonymous methods, yield* keyword (in C#)

Q&A Thank you for your attention! Krisztián Pócza kpocza@kpocza.net Mihály Biczó mihaly.biczo@axelero.hu Zoltán Porkoláb gsd@elte.hu

Cross-language Program Slicing in the .NET Framework

Cross-language Program Slicing in the .NET Framework

Presentation Transcript

The .Net Framework

.NET FRAMEWORK

Program Slicing

.Net Framework

.Net Framework

Program Slicing

Program Slicing

Program slicing Techniques

Program Slicing

Security in .NET Framework

The .NET Framework

New Results in Program Slicing

.Net Framework

Program Slicing