1 / 50

Laboratory of Software Analysis Lesson 1

Laboratory of Software Analysis Lesson 1. Mariano Ceccato, FBK-Irst Trento, Italy ceccato@fbk.eu. Alesssandro Marchetto FBK-Irst Trento, Italy marchetto@fbk.eu. Overview. Objectives Course dependences Content / Course material / tools used Exam (discussion) Legacy systems

shel
Download Presentation

Laboratory of Software Analysis Lesson 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Laboratory of Software AnalysisLesson 1 Mariano Ceccato, FBK-Irst Trento, Italy ceccato@fbk.eu Alesssandro Marchetto FBK-Irst Trento, Italy marchetto@fbk.eu

  2. Overview • Objectives • Course dependences • Content / Course material / tools used • Exam (discussion) • Legacy systems • Reverse engineering, re-structuring, re-engineering • Program transformations (TXL) • Past projects • This year: “three small projects”

  3. Objectives, dependences, material, exam.

  4. Objectives This course has two objectives: • providing the practical skills involved in software analysis and testing. Some techniques/approaches described during the theoretical lessons of the basic course (Software Analysis and Testing) will be applied to real cases of software systems to be re-engineered and tested. • introducing “Empirical studies in Software engineering”

  5. Dependences Not mandatory but … ---> Programming I and II, Software Engineering, Software Analysis and Testing. ---> It is important to kwon (a little): - OO programming, in particular Java (base level). - UML (class diagram, …). - WEB technologies: HTML, JSP, … (Just a little) - Theoretical aspects of testing. - …

  6. Content Code analysis and transformations • Theoretical aspects (already seen in Software analysis and testing). • The TXL programming language. • Practice: application of some techniques to software systems. Code Obfucation • Theoretical aspects. • Techniques to protect against the steal of intellectual property • Practice: application of obfuscation on actual code. Software testing • Theoretical aspects (already seen in Software analysis and testing). • Acceptance testing, GUI testing, Test-first, “Design for testing”, … • Tools: FIT, FITNESSE, JUnit, ABBOT, Robot … Empirical studies in Software engineering • Theoretical aspects (what is an ES?, How to design/conduct an ES?) • Analysis and interpretation (how to draw conclusions) • Execution of two empirical studies.

  7. Material / Tools • Slides • Papers • Manuals of tools http://selab.fbk.eu/ceccato/courses/lsa2009/ Languages and Tools: • TXL: code analysis and transformations • JUnit, Fit, Fitnesse, Abbot, Robot: Testing tools • …

  8. Examination • During the course we will work on a project. • The examination will consist of a discussion. • Admission to the examination requires (at least) the production • of some documents that we will see during the year. • Examples of small projects: • Recovering the Architecture (class diagram) of a system. • Maintenance intervention / re-implementation of a system • Porting a C program in Java • Testing • Empirical study: C++ vs. Java • …

  9. A little of Terminology …

  10. Negative aspects Positive aspects Legacy systems Characteristics: • They were implemented years ago ( 1970) • Their technology became obsolete (obsolete languages, language • styles, hardware, …) • They have been maintained for a long time ( 30 years) • Their structure is deteriorated and does not facilitate understanding • Their documentation (if it exists) became obsolete • Original authors are not available • They contain business rules not recorded elsewhere • They can not be easily replaced (importart!) • They represent a large investment • … Each maintenance intervention is Extremely difficult!

  11. Legacy dilemma What should we do with legacy code? • to build the new system from scratch. • trying to understand the legacy code and to reconstitute it in a new form. First step “reverse engineering”

  12. Reverse Engineering • Reverse engineering is the process of taking something (a device, an electrical component, a car, a software, …) apart and analyzing its working in details, usually with the intention to construct a new device or program that does the same thing. • Reverse engineering is used often by military, in order to copy other nations’ technology.

  13. Military Reverse Engineered projects Examples of military reverse-engineered projects include: • Soviet Union reverse-engineered Tu-4 Bull bomber from United States Boing B-29. • Soviet Union personal computer AGATHA was reverse-engineered from the Apple II. • North Korea reverse-engineered the Russian missile Scud Bs to make their own Scud Mod A. “Boing B-29”

  14. (Software) Reverse engineering • Reverse engineering is a process that helps understanding a software system. It is a process of examination, of extracting information, not a process of change or replication. Software ----------> “Abstract representation” Software ---------->

  15. Forward and Reverse Engineering • Forward engineering is the traditional process of moving from high-level abstractions to the physical implementation of a system. Design Implementation Requirements • Reverse engineering is “the inverse” of Forward engineering Design Implementation Requirements “Abstract Code Representation” Code

  16. Reverse Engineering Tools Pretty printersand code viewers Diagram generators (software views: flowcharts, data flow diagrams, call graph diagrams, …) Embedded comments extractors (ex. Javadoc) Software metrics tools (Locs, methods/functions, cohesion, coupling…) Design recovery tools (ex. Rational Rose, Omondo, VisualUML: UML diagram extractor) Others …

  17. Restructuring • Restructuring is the transformation from one representation to • another at the same relative abstraction level - while preserving • the system external behavior (functionality and semantics). • Examples: • Code level: - from an unstructured (“spaghetti”) form to a • structured form (“goto-less”) • - conversion of set of “if-statements” into a • “case structure”. • Design level: to improve or change data structures (arrays to Lists, • files system to DBMS …) or to improve algorithms • (for example: time complexity).

  18. Re-engineering • Re-engineering is the examination (reverse engineering) of a system to reconstitute it (forward engineering) in a new form. • This process may include modifications with respect to new requirements not met by the original system (Semantics cannot be preserved). • The re-engineering process takes many forms, depending on its • objectives. Sample objectives are: • code migration/porting (ex. C to C++) • reengineering code for reuse • reengineering code for security • …

  19. Relationships

  20. Program analysis • Program analysis is the (automated) inspection of a program • to infer some properties. Usually, properties are inferred without • running the program (static analysis). • Examples are: • Type analysis (type inference) • Dead code analysis • Clone analysis • Pointer Analysis • …

  21. Program Transformations • Program transformation is the act of changing one program into another. Two cases: L is different from L’ L is equal to L’ transformation P P’ source language L target language L’ • Examples: • Pascal to C porting • Goto elimination (Pascal language)

  22. TXL • TXL is a programming language specifically designed to support software analysis and program transformation. x4 := b + c; y2 := x4 – 2; Loop x := a + b; y := x; a := y + 3; z := x4 + a * y2; End loop Loop x := a + b; y := x; a := y + 3; x := b + c; y := x – 2; z := x + a * y; End loop Code motion optimization Example: moves all loop-independent assignment statements outside of loops.

  23. Past Projects and Project of this year

  24. Project Year 2004 “Porting C to Java” • Porting of the Chull program (C code) in Java. • Chull determines the convex hull of a set of points in 3D. • Chull is not a trivial C code: (4161 LOCs, 31 functions, 3 struct, pointers, …). “Convex hull in 2D”

  25. Project Steps 2004 “Semi-automated procedure” • Instrumentation of Chull using TXL. Writing Testcases such that branch covered is reached. • Reverse engineering of Chull using TXL: Call graph, dependences between functions and data structure. • Object identification (clustering and concept analysis). • OO design in UML (only class diagram). • Java code generation. Chull’ (partially TXL) • Testing of Chull’ with testcases generated at point (1) to show that Chull[i] = Chull’[i]

  26. Code instrumentation • To determine whether or not each branch is traversed, we can place a ‘counter’ (instrumentation) on each branch. Then we have to run the program with inputs. • To have branch coverage we have to check if “count” is equal to (1, 1, …, 1). read x, y start count(1) := 1 ‘Program instrumented’ z := 1 count = (0, 0, … 0) true If (x >y) count(3) := 1 exit false N.B count is an array where each element is assigned to 0. … count(2) := 1

  27. Project Year 2005“Maintenance intervention” is implemented by code fragments spread across several classes • Adding a new “crosscutting functionality” (persistence history) to the “Jconsole” java program . • Jconsole: 27 java files, 1385 LOCs. • Two ways for adding a crosscutting functionality to a system: 1) Changing (almost) all the java classes. 2) Adding an aspect (AOP) in the language AspectJ.

  28. AspectJ example Suppose to have to add ‘logging’ for all methods of a Java program. (Logger.entry(string) and Logger.exit(string)) /** Java */ Public class Main{ public void foo() { Logger.entry(“foo()”) …. Something … Logger.exit(“foo()”) } public void foo(int i) { Logger.entry(“foo(int)”) …. Something … Logger.exit(“foo(int)”) } public static void main(String [] args) { Logger.entry(“main()”) …. Something … Logger.exit(“main()”) } /** AspectJ */ Public class Main{ public void foo() { …. Something … } public void foo(int i) { …. Something … } public static void main(String [] args) { …. Something … } Public aspect autolog{ pointcut publicMethods(): …. Before(): publicMethods() {Logger.entry …} After(): publicMethods() {Logger.exit …} }

  29. Project Year 2006“a real SE experiment” • We have conducted a real software engineering experiment: stereotyped UML class diagrams (“Conallen” proposal) vs. Pure UML class diagrams • What are stereotypes? • What is a software engineering experiment (or software engineering empirical study)? Web Applications context

  30. Stereotypes • The designer’s of UML recognized that the language is not always perfect for every situation/domain. • UML has defined a mechanism to allow certain domains to extend the semantics of specific model elements. The extension mechanism allows the inclusion of new attributes, different semantics and additional constraints. • Stereotypes form an extension to UML. • Stereotypes are adornments or icons having a well-defined semantics. Used instead of classes in the class diagram …

  31. Empirical studies in SE • Software engineering is the result of opinions and anecdotal evidences and not the result of empirical evidence... • For example no one has demonstrated that OO techniques are better that structured techniques, but everyone uses OO ... • Empirical studies (experiments) are useful to try to answer some research questions. “technique A is better than B?”

  32. How to conduct an empirical study? Suppose that we have to “demonstrate” this hypothesis: “technique A is better than B” Procedure: • Participants (students, professionals, etc) are divided into two groups (Group 1 and Group 2). • Group 1 will execute the task with technique A while Group 2 with technique B. • Data of the experiment are collected and metrics are measured. • The hypothesis of the experiment is evaluated statistically using data collected and metrics.

  33. Empirical study 1: “Conallen” vs. Pure UML “Conallen notation” Pure UML Which is more useful during understanding and maintenance?

  34. Project Year 2007 • Porting “Extract class diagram” program to Java using TXL • Empirical study 1: Testcases (“Fit tables”) can be used to clarify requirements? • Empirical study 2: Conallen vs. WebML. When doing a comprehension task is more useful Conallen or WebML? Three projects:

  35. Fit tables • A Fit table is a way of expressing the business logic using a simple (input-output) HTML table. • Fit tables are “added to the requirements” and are used as acceptance test cases. • Customers and Analysts create Fit tables using a tool like Word, Excel, or even a text editor. input output

  36. We did an example to understand a little bit better Fit tables Sports Magazine Website • A sports magazine decides to add a new feature to its Website that will allow users to view top football teams based on their ratings. • Rating = ((10000*(won*3+drawn)) (3*played))/100) • The analyst can express the change requirement in the traditional way: - natural language, use cases, …. or - using natural language + Fit tables “new feature added”

  37. Fit tables can be used to clarify requirements? Natural languagevs. Fit tables + natural language “Only natural language” “Fit table + natural language” • A user can search for top N football teams based on rating. • The rating is defined … • A user can search for top N football teams based on rating.

  38. Empirical study 2 Questionnaire Group 1 + • When doing a comprehension task which is the notation more useful? Conallen Questionnaire Web appls + Group 2 WEBML

  39. Project Year 2008:Obfuscation • Obfuscation transforms a program into a new program which: • Has the same semantics • Is harder to reverse engineer

  40. JSnapScreen 0.1 • http://sourceforge.net/projects/jsnapscreen • Open source java project (2k LoC) • It takes snapshoot of the current screen

  41. The end … Next lessons …

  42. “Obfuscated C contest Winner” return IOCCC is a competition to see who can write the most unreadable, but legal C program. return

  43. return Code viewers 1) Textual representation (colors) 2) Graphical representation (colors)

  44. A picture is worth a thousand words … Imagix tool Main Window ( call graph) ‘C code’ Calls Functions Variables return

  45. return CVF 3.0 CVF 3.0 is a automated program Flow chart generator. It can perform automated reverse engineering of program code into programming flowcharts. It works with: C, C++, VC++, VB, VBA, VBScript, ASP, Visual C#, Visual Basic .NET, Visual J# .NET, VC++.NET, ASP.NET, Java, JSP, JavaScript, Delphi, PowerBuilder and Perl.

  46. return UML Class Diagram Recovery

  47. Type inference Inferred Types Program P a:=4; c:=a+b; Push(x, T); Push(y, T); d:=Pop(T); a: integer c:real, b:real T: queue d:real “Language without declarations” return

  48. Dead code … 20: FOR I=1 TO 10 30: V[I] = V[I] +1; 40: PRINT V[I] 50: ENDFOR 60: PRINT X; 70: GOTO 100 80: CALL F1; 90: CALL F2; 100 END Suppose: No jumps to the lines 80 and 90! Never executed return

  49. return Clones … 20: FOR I=1 TO 10 30: V[I] = V[I] +1; 40: PRINT V[I] 50: ENDFOR 60: PRINT X; 70: CALL F; … 100: FOR J=1 TO 10 110: W[J] = W[J] +1; 120: PRINT W[J] 130: ENDFOR • Example: clone analysis Clones: … Lines 20-50 and 100-130; …

More Related