500 likes | 660 Views
Laboratory of Software Analysis Lesson 1. Mariano Ceccato, FBK-Irst Trento, Italy ceccato@fbk.eu. Alesssandro Marchetto FBK-Irst Trento, Italy marchetto@fbk.eu. Overview. Objectives Course dependences Content / Course material / tools used Exam (discussion) Legacy systems
E N D
Laboratory of Software AnalysisLesson 1 Mariano Ceccato, FBK-Irst Trento, Italy ceccato@fbk.eu Alesssandro Marchetto FBK-Irst Trento, Italy marchetto@fbk.eu
Overview • Objectives • Course dependences • Content / Course material / tools used • Exam (discussion) • Legacy systems • Reverse engineering, re-structuring, re-engineering • Program transformations (TXL) • Past projects • This year: “three small projects”
Objectives This course has two objectives: • providing the practical skills involved in software analysis and testing. Some techniques/approaches described during the theoretical lessons of the basic course (Software Analysis and Testing) will be applied to real cases of software systems to be re-engineered and tested. • introducing “Empirical studies in Software engineering”
Dependences Not mandatory but … ---> Programming I and II, Software Engineering, Software Analysis and Testing. ---> It is important to kwon (a little): - OO programming, in particular Java (base level). - UML (class diagram, …). - WEB technologies: HTML, JSP, … (Just a little) - Theoretical aspects of testing. - …
Content Code analysis and transformations • Theoretical aspects (already seen in Software analysis and testing). • The TXL programming language. • Practice: application of some techniques to software systems. Code Obfucation • Theoretical aspects. • Techniques to protect against the steal of intellectual property • Practice: application of obfuscation on actual code. Software testing • Theoretical aspects (already seen in Software analysis and testing). • Acceptance testing, GUI testing, Test-first, “Design for testing”, … • Tools: FIT, FITNESSE, JUnit, ABBOT, Robot … Empirical studies in Software engineering • Theoretical aspects (what is an ES?, How to design/conduct an ES?) • Analysis and interpretation (how to draw conclusions) • Execution of two empirical studies.
Material / Tools • Slides • Papers • Manuals of tools http://selab.fbk.eu/ceccato/courses/lsa2009/ Languages and Tools: • TXL: code analysis and transformations • JUnit, Fit, Fitnesse, Abbot, Robot: Testing tools • …
Examination • During the course we will work on a project. • The examination will consist of a discussion. • Admission to the examination requires (at least) the production • of some documents that we will see during the year. • Examples of small projects: • Recovering the Architecture (class diagram) of a system. • Maintenance intervention / re-implementation of a system • Porting a C program in Java • Testing • Empirical study: C++ vs. Java • …
Negative aspects Positive aspects Legacy systems Characteristics: • They were implemented years ago ( 1970) • Their technology became obsolete (obsolete languages, language • styles, hardware, …) • They have been maintained for a long time ( 30 years) • Their structure is deteriorated and does not facilitate understanding • Their documentation (if it exists) became obsolete • Original authors are not available • They contain business rules not recorded elsewhere • They can not be easily replaced (importart!) • They represent a large investment • … Each maintenance intervention is Extremely difficult!
Legacy dilemma What should we do with legacy code? • to build the new system from scratch. • trying to understand the legacy code and to reconstitute it in a new form. First step “reverse engineering”
Reverse Engineering • Reverse engineering is the process of taking something (a device, an electrical component, a car, a software, …) apart and analyzing its working in details, usually with the intention to construct a new device or program that does the same thing. • Reverse engineering is used often by military, in order to copy other nations’ technology.
Military Reverse Engineered projects Examples of military reverse-engineered projects include: • Soviet Union reverse-engineered Tu-4 Bull bomber from United States Boing B-29. • Soviet Union personal computer AGATHA was reverse-engineered from the Apple II. • North Korea reverse-engineered the Russian missile Scud Bs to make their own Scud Mod A. “Boing B-29”
(Software) Reverse engineering • Reverse engineering is a process that helps understanding a software system. It is a process of examination, of extracting information, not a process of change or replication. Software ----------> “Abstract representation” Software ---------->
Forward and Reverse Engineering • Forward engineering is the traditional process of moving from high-level abstractions to the physical implementation of a system. Design Implementation Requirements • Reverse engineering is “the inverse” of Forward engineering Design Implementation Requirements “Abstract Code Representation” Code
Reverse Engineering Tools Pretty printersand code viewers Diagram generators (software views: flowcharts, data flow diagrams, call graph diagrams, …) Embedded comments extractors (ex. Javadoc) Software metrics tools (Locs, methods/functions, cohesion, coupling…) Design recovery tools (ex. Rational Rose, Omondo, VisualUML: UML diagram extractor) Others …
Restructuring • Restructuring is the transformation from one representation to • another at the same relative abstraction level - while preserving • the system external behavior (functionality and semantics). • Examples: • Code level: - from an unstructured (“spaghetti”) form to a • structured form (“goto-less”) • - conversion of set of “if-statements” into a • “case structure”. • Design level: to improve or change data structures (arrays to Lists, • files system to DBMS …) or to improve algorithms • (for example: time complexity).
Re-engineering • Re-engineering is the examination (reverse engineering) of a system to reconstitute it (forward engineering) in a new form. • This process may include modifications with respect to new requirements not met by the original system (Semantics cannot be preserved). • The re-engineering process takes many forms, depending on its • objectives. Sample objectives are: • code migration/porting (ex. C to C++) • reengineering code for reuse • reengineering code for security • …
Program analysis • Program analysis is the (automated) inspection of a program • to infer some properties. Usually, properties are inferred without • running the program (static analysis). • Examples are: • Type analysis (type inference) • Dead code analysis • Clone analysis • Pointer Analysis • …
Program Transformations • Program transformation is the act of changing one program into another. Two cases: L is different from L’ L is equal to L’ transformation P P’ source language L target language L’ • Examples: • Pascal to C porting • Goto elimination (Pascal language)
TXL • TXL is a programming language specifically designed to support software analysis and program transformation. x4 := b + c; y2 := x4 – 2; Loop x := a + b; y := x; a := y + 3; z := x4 + a * y2; End loop Loop x := a + b; y := x; a := y + 3; x := b + c; y := x – 2; z := x + a * y; End loop Code motion optimization Example: moves all loop-independent assignment statements outside of loops.
Project Year 2004 “Porting C to Java” • Porting of the Chull program (C code) in Java. • Chull determines the convex hull of a set of points in 3D. • Chull is not a trivial C code: (4161 LOCs, 31 functions, 3 struct, pointers, …). “Convex hull in 2D”
Project Steps 2004 “Semi-automated procedure” • Instrumentation of Chull using TXL. Writing Testcases such that branch covered is reached. • Reverse engineering of Chull using TXL: Call graph, dependences between functions and data structure. • Object identification (clustering and concept analysis). • OO design in UML (only class diagram). • Java code generation. Chull’ (partially TXL) • Testing of Chull’ with testcases generated at point (1) to show that Chull[i] = Chull’[i]
Code instrumentation • To determine whether or not each branch is traversed, we can place a ‘counter’ (instrumentation) on each branch. Then we have to run the program with inputs. • To have branch coverage we have to check if “count” is equal to (1, 1, …, 1). read x, y start count(1) := 1 ‘Program instrumented’ z := 1 count = (0, 0, … 0) true If (x >y) count(3) := 1 exit false N.B count is an array where each element is assigned to 0. … count(2) := 1
Project Year 2005“Maintenance intervention” is implemented by code fragments spread across several classes • Adding a new “crosscutting functionality” (persistence history) to the “Jconsole” java program . • Jconsole: 27 java files, 1385 LOCs. • Two ways for adding a crosscutting functionality to a system: 1) Changing (almost) all the java classes. 2) Adding an aspect (AOP) in the language AspectJ.
AspectJ example Suppose to have to add ‘logging’ for all methods of a Java program. (Logger.entry(string) and Logger.exit(string)) /** Java */ Public class Main{ public void foo() { Logger.entry(“foo()”) …. Something … Logger.exit(“foo()”) } public void foo(int i) { Logger.entry(“foo(int)”) …. Something … Logger.exit(“foo(int)”) } public static void main(String [] args) { Logger.entry(“main()”) …. Something … Logger.exit(“main()”) } /** AspectJ */ Public class Main{ public void foo() { …. Something … } public void foo(int i) { …. Something … } public static void main(String [] args) { …. Something … } Public aspect autolog{ pointcut publicMethods(): …. Before(): publicMethods() {Logger.entry …} After(): publicMethods() {Logger.exit …} }
Project Year 2006“a real SE experiment” • We have conducted a real software engineering experiment: stereotyped UML class diagrams (“Conallen” proposal) vs. Pure UML class diagrams • What are stereotypes? • What is a software engineering experiment (or software engineering empirical study)? Web Applications context
Stereotypes • The designer’s of UML recognized that the language is not always perfect for every situation/domain. • UML has defined a mechanism to allow certain domains to extend the semantics of specific model elements. The extension mechanism allows the inclusion of new attributes, different semantics and additional constraints. • Stereotypes form an extension to UML. • Stereotypes are adornments or icons having a well-defined semantics. Used instead of classes in the class diagram …
Empirical studies in SE • Software engineering is the result of opinions and anecdotal evidences and not the result of empirical evidence... • For example no one has demonstrated that OO techniques are better that structured techniques, but everyone uses OO ... • Empirical studies (experiments) are useful to try to answer some research questions. “technique A is better than B?”
How to conduct an empirical study? Suppose that we have to “demonstrate” this hypothesis: “technique A is better than B” Procedure: • Participants (students, professionals, etc) are divided into two groups (Group 1 and Group 2). • Group 1 will execute the task with technique A while Group 2 with technique B. • Data of the experiment are collected and metrics are measured. • The hypothesis of the experiment is evaluated statistically using data collected and metrics.
Empirical study 1: “Conallen” vs. Pure UML “Conallen notation” Pure UML Which is more useful during understanding and maintenance?
Project Year 2007 • Porting “Extract class diagram” program to Java using TXL • Empirical study 1: Testcases (“Fit tables”) can be used to clarify requirements? • Empirical study 2: Conallen vs. WebML. When doing a comprehension task is more useful Conallen or WebML? Three projects:
Fit tables • A Fit table is a way of expressing the business logic using a simple (input-output) HTML table. • Fit tables are “added to the requirements” and are used as acceptance test cases. • Customers and Analysts create Fit tables using a tool like Word, Excel, or even a text editor. input output
We did an example to understand a little bit better Fit tables Sports Magazine Website • A sports magazine decides to add a new feature to its Website that will allow users to view top football teams based on their ratings. • Rating = ((10000*(won*3+drawn)) (3*played))/100) • The analyst can express the change requirement in the traditional way: - natural language, use cases, …. or - using natural language + Fit tables “new feature added”
Fit tables can be used to clarify requirements? Natural languagevs. Fit tables + natural language “Only natural language” “Fit table + natural language” • A user can search for top N football teams based on rating. • The rating is defined … • A user can search for top N football teams based on rating.
Empirical study 2 Questionnaire Group 1 + • When doing a comprehension task which is the notation more useful? Conallen Questionnaire Web appls + Group 2 WEBML
Project Year 2008:Obfuscation • Obfuscation transforms a program into a new program which: • Has the same semantics • Is harder to reverse engineer
JSnapScreen 0.1 • http://sourceforge.net/projects/jsnapscreen • Open source java project (2k LoC) • It takes snapshoot of the current screen
The end … Next lessons …
“Obfuscated C contest Winner” return IOCCC is a competition to see who can write the most unreadable, but legal C program. return
return Code viewers 1) Textual representation (colors) 2) Graphical representation (colors)
A picture is worth a thousand words … Imagix tool Main Window ( call graph) ‘C code’ Calls Functions Variables return
return CVF 3.0 CVF 3.0 is a automated program Flow chart generator. It can perform automated reverse engineering of program code into programming flowcharts. It works with: C, C++, VC++, VB, VBA, VBScript, ASP, Visual C#, Visual Basic .NET, Visual J# .NET, VC++.NET, ASP.NET, Java, JSP, JavaScript, Delphi, PowerBuilder and Perl.
return UML Class Diagram Recovery
Type inference Inferred Types Program P a:=4; c:=a+b; Push(x, T); Push(y, T); d:=Pop(T); a: integer c:real, b:real T: queue d:real “Language without declarations” return
Dead code … 20: FOR I=1 TO 10 30: V[I] = V[I] +1; 40: PRINT V[I] 50: ENDFOR 60: PRINT X; 70: GOTO 100 80: CALL F1; 90: CALL F2; 100 END Suppose: No jumps to the lines 80 and 90! Never executed return
return Clones … 20: FOR I=1 TO 10 30: V[I] = V[I] +1; 40: PRINT V[I] 50: ENDFOR 60: PRINT X; 70: CALL F; … 100: FOR J=1 TO 10 110: W[J] = W[J] +1; 120: PRINT W[J] 130: ENDFOR • Example: clone analysis Clones: … Lines 20-50 and 100-130; …