Abdelwahab Hamou-Lhadj Timothy Lethbridge ICPC 2006 Athens, Greece

Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy LethbridgeICPC 2006 Athens, Greece

Motivation • Software engineers need to explore traces • To understand an unexpected behaviour • For general understanding • Traces tend to be excessively large and complex •  Hard to understand • Tools are needed • Studies conducted at QNX and Mitel confirm this

Limitations of Existing Techniques • Existing tools have key limitations: 1. Manual exploration has to be done bottom-up: • Start from a full, complex trace • Apply filters and searches to uncover needed information • Often difficult to perform 2. They do not interoperate well • Limiting sharing of data and techniques

Trace Summarization • Goal: Permit top-down or middle-out exploration of traces • Top-down: start with small summary, then selectively expand • Middle-out: start with a much simplified trace, then selectively contract and expand • Initial higher-level trace view enables quicker comprehension • Searching in hidden details could still be available

What is a Trace Summary? • Definition of a text summary (S. K. Jones) • “a derivative of a source text condensed by selection and/or generalization on important content” • Summarizing traces is analogous to summarizing text • Select the most important information or removing information of least importance • Generalize by treating similar things as the same • Showing only one instance of an iteration or recursion • Treating similar elements and patterns as if they were the same

Content Selection in Traces • Find implementation details and remove them • Calls to well-known libraries, classes and functions of little interest to the abstract • Math functions, string comparison, user interface calls (perhaps), etc. • Automatically-detected utilities • discussed later

Content Generalization in Traces • Replace specific content with more abstract information • Show only one instance of an iteration or recursion • Treat similar sequences of events as if they were the same by varying a similarity function used to compare sequences of calls • E.g. ABC, ABBC, ABABABC --> ABC • Identify patterns found in many traces • A library of these can be built, so each time one works with a similar trace, known patterns can be flagged • Can be replaced with a user-defined label

Trace Summarization Process • Step 1: Set the parameters for the summarization process • When to stop the process • - How much detail is desired • Known implementation details (libraries etc.) • Known patterns • Similarity function to use • Other algorithm parameters • Step 2: Run the selection and generalization • Step 3: Output the result in a format that can be manipulated by the analyst

Trace Summarization Process (Cont.) • After Step 3, the maintainer can evaluate the result, and if not satisfied: • Adjust the parameters and run the process again, or • Manually manipulate the output • Contract the trace further • Expand various branches

A Key Step: Detecting Utilities • A utility: • Is something called from several places • Can be packaged in a non-utility module • Is used to facilitate implementation rather than being a core part of the architecture

N Log( ) Fanout(r) + 1 Fanin(r) Log(N) U(r) = x N Utilityhood Metric • N: size of the static call graph built from the system under study • U(r) ranges from • 0 not a utility) • 1 (most likely to be a utility)

Automatic Detection of Utilities r3 r1 r2 r4 r6 r5 r5 r7 Utilities at narrower scopes

Some Considerations • To detect utilities, it is important to have available the static call graph • Instead ofthe dynamic call graph • The dynamic graph will give a false impression of the extent to which something is a utility • Polymorphic calls can be resolved using various approaches in the literature • Hard to determine the scope of a utility if the system architecture is not clear • Architecture recovery techniques can be a useful adjunct

Case Study • Target System: • Weka System: Machine learning algorithms • Object-oriented, written in Java • 10 packages, 147 classes, 1642 public methods, and 95 KLOC. • Process Description: • Instrument the system • Run the system by selecting a software feature • Generate a static call graph from the Weka structure • Apply the trace summarization algorithm

Setting the Algorithm Parameters • Exit condition: • The number of distinct subtrees of the summary is 10% of the number of subtrees of the initial • Implementation Details: • Accessing methods, constructors, methods of inner classes, user-defined utilities

Quantitative Results • Manual manipulation was performed using a trace analysis tool called SEAT

Validation • The summary was converted into a UML sequence diagram • Participants: Nine software engineers with good to excellent knowledge of Weka • Evaluation focused on the ability of the summary to represent the main events of the trace in the subjective view of the participants

Main Questions Asked Q1. How would you rank the quality of the summary with respect to whether it captures the main interactions of the traced scenario? Q2. If you designed or had to design a sequence diagram (or any other behavioural model) for the traced feature while you were designing the Weka system, how similar do you think that your sequence diagram would be to the extracted summary? Q3. In your opinion, how effective can a summary of a trace be in software maintenance?

Feedback of the Participants ‘Very poor’ (score of 1) and ‘Excellent’ (score of 5)

Observations • Participants agreed that • The summary is a good high-level representation of the traced feature • Summaries can help understand the dynamics of a poorly documented system • The level of details needed varies from one participant to another • E.g. P3 (an expert) commented that more details might be needed • A tool that must allow manipulation of the level of details displayed

Conclusions • Summarizing large traces can help understand the features of a system and causes of problems • The approach • Uses a mix of selection and generalization • Facilitates quick iteration to arrive at the most useful summary in the eyes of the uer • Detecting utilities with a utilityhood metric is important • Case study results show that method is promising

Future Directions • Experiments with many other traces to further validate the approach • Improve tools to speed iteration and interactivity • Explore variants on the approach to utility detection

Abdelwahab Hamou-Lhadj Timothy Lethbridge ICPC 2006 Athens, Greece