F. Boerboom, A. Janssen, G. Lommerse, F. Nossin, L. Voinea, A. Telea

The Visual Code Navigator: An Interactive Toolset For Source Code Investigation F. Boerboom, A. Janssen, G. Lommerse, F. Nossin, L. Voinea, A. Telea Eindhoven University of Technology, the Netherlands

Outline The Visual Code Navigator (VCN): • an environment for interactive visualization of industry-sizesource code projects • tuned for C/C++ code bases stored in CVS • targets understanding code evolution and code structure • based on three views with complementary purposes How can we extract facts from source code? What can the VCN source code views show?

Fact extraction • Notoriously difficult problem…Requirements (roughly): • completeness:- extracts all elements & cross-refs from source code - extracts correct information - complies with latest C/C++ standard - includes preprocessor facilities • tolerance: • - handles incomplete/incorrect/ambiguous code • efficiency: • - memory/speed efficient on industry-size code bases • availability:- can be built from source code, preferably cross-platform

Existing fact extractors Testing: - get the tool as binary/source; try to build it - analyze very large systems (>0.5MLOC) - select extremely messy C/C++ code - try with/without includes (incomplete) - check output for size, correctness, completeness, throughput - investigate limitations’ causes ++ very good + good o could be better - limited -- unacceptable/missing ? insufficiently tested

Conclusions • Many surprises: • most tools extract interface data quite ok • … but badly fail at parsing implementation (function bodies) • tolerance and completeness are mutually exclusive • completeness and performance are also complementary • GLR grammar based tools are by far the best • Overall, we found just one reasonably good tool: Columbus • However, it is: • closed-source • limited in some technical respects (template handling) • quite slow (1 hr 20 min for ~150000 LOC) How can we do better than the above tools?

EFES: An own C/C++ fact extractor • We chose to build an own extractor: • based on the Elkhound C/C++ GLR parser • uses a modified preprocessor, for tolerance • extends the parser, for tolerance vs incomplete/incorrect code & handling templated code • uses compression techniques to compact/speed up output • So far: • tests on very large projects (>200 MLOC) look good • we are 3..7 times faster than Columbus • we produce the ‘bare’ info, no metrics yet Hard, but unavoidable endeavour

EFES Architecture source: any C/C++ project, possibly incomplete/incorrect code preprocessor: libcpp, also used by GNU CPP parser: Elsa – uses the Elkhound GLR parser generator type checker: disambiguates code with type information filter: limits output to a set of interest (e.g. files, scopes, …) output generator: efficiently writes the output information to a file

EFES Enhancements Several enhancements to ‘standard’ fact extraction: preprocessor: enhanced CPP to produce exact location information (needed later for construct visualization & comparison) parser & enhanced Elsa to:type checker: - parse incorrect code with extra grammar rules – errors are caught at scope level - extended Elsa’s template support - added checkpoints at top-form level to trap internal errors filter: novel element; reduces output size dramatically, e.g. by skipping standard header information output added compact binary output; reduces output size 10 times generator: increases output speed 5 times project lets users customize extraction (C++ dialect, filtering, parser concept: strictness, what to output, etc)

Performance & Results Columbus We are 3..7 times faster EFES

Conclusions • We’ve build a powerful C/C++ fact extractor: • works on large projects (>200 MLOC) • handles incorrect/incomplete code well • extracts virtually all raw information there is • is 3..7 times faster than a known commercial solution • Desired additions • distil raw information into more interesting facts (metrics, patterns, etc) • add query layer atop basic extractor • add interactive visualization layer atop query layer An evolving project

Visualization • We have now our extracted facts: • variables, types, functions, classes… • cross-references between all these • location information (file, line, column) of each construct • We like to show it to the user & answer questions: • how is the code structured? • how are programming constructs distributed? • how has the code changed in time? • how are the typical function signatures used in a project? • …and so on Several visualization tools

1) Syntactic view: 1 version, N files – code view • Basic idea: • combine a classical text editor with a pixel-based text display (e.g. SeeSoft) in a single view • let users smoothly navigate between the two • blend syntactic structures over code text using cushions syntax tree result source code + cushion texture cushion profile f(x) border size x

Syntactic view: Classical code editor…

Syntactic view: Blend in structure cushions…

Syntactic view: More structure cushions…

Syntactic view: Zoom out on 10 files, ~7000 LOC

Syntactic view: Zoom out, structure cushions only

Cushion vs ‘syntax highlighting’ • clasical syntax highlighting is actually lexical lighlighting • we generalize and enhance syntax highlighting syntax highlighting structure cushions

Syntactic view: Navigation user points the mouse at some code location…

Syntactic view: Spot cursor …and brings the text in focus above the structure

Syntactic view: Structure cursor …over a whole syntactic construct, if desired.

Syntactic view - Conclusions • Two main uses: • Overview: • good for showing up to 10-15000 LOC on one screen • colors code by construct type • easy to spot presence/distribution of constructs in code • Detail: • good for quick browsing a single source file • gives structure context information • typical question: • “where was that function with that doubly-nested for?”

2) Symbol view: N files, 1 version – interface view • Displays public symbols in source files • Nested by scope rules (global, namespace, method, argument) • Visualized using a cushion treemap, colored by symbol type ‘public’symbolsin files arguments functions fields typedefs global vars files files

Symbol view - Details • Treemap node size computation: - leafs: function bodies: number of LOC in declaration else number of LOC or sizeof() - non-leafs: sum of children • Shading:- hue: construct type (typedef, function, argument, …)- saturation: construct nesting (global/class scope) • Targeted questions: - “what kind of symbols are in a library’s headers?” - “how are namespaces used in interface headers?” - “does a header have a simple / uniform structure or not?” - “are there heavy functions from a parameter-passing view?”

Symbol view: Example C global namespace C++ std namespace symbols in file brushed file

3) Evolution view: M files, N versions Basic idea: CVSscan tool [Voinea & Telea, ACM SoftVis’05] time (version) axis file axis source code details

Evolution view: M files, N versions • extends the CVSscan tool[Voinea & Telea, ACM SoftVis’05] • stacks several stripped-out file evolution views above each other • line color = construct type • helps spotting cross-file correlations (e.g. large changes) file axis comments time (version) axis function bodies strings function headers

Evolution view - Results • We look for: • Large size jumps = large code changes • Size jumps correlating across more files at same version = cross-system changes • Less ‘wavy’ patterns = stable(r) files • Horizontal patterns = unchanged code

Evaluation • Method & materials: - VTK C++ library (1 MLOC, 100 versions) - 3 users with C++ but no VTK knowledge - 1 user with C++ and VTK knowledge (evaluator) • - quantitative and qualitative questions to be answered on VTK with and without VCN Questions Evo Stx Sym are files fine/coarse grained? what is the typical class interface structure? what is the typical class implem. structure? find & describe a few large evolution changes what is the typical macro usage/frequency? what is the typical comment usage/frequency? preferred/first tool optional/second tool

Evaluation • Results: • VCN allowed getting answers (much) faster than by pure classical source code browsing • views are complementary, serve different tasks in different ways • a single view is usually not enough • a fine-tuned, fast, integrated system is essential! • users reluctant to work with lame/suboptimal tools interface? fine insight text editor start symbol view evolution view fine insight implementation? syntax view

Implementation • Syntactic view: • cushions: OpenGL textures - superimposed, not blended • careful cushion border design (see paper) • Symbol view: • cushion treemap: OpenGL fragment programs • essential for interactive, fast navigation! • Evolution view: • column cushions: OpenGL textures • several LOC / pixel solve by software antialiasing • efficient tool design essential for smooth navigation in large code bases important for user acceptance

Conclusions • VCN: multi-view visual environment for understanding source code and its evolution • Syntax view: 1 version, N files (compiler) • Symbol view: 1 version, N version (linker) • Evolution view: M versions, N files • Dense pixel displays essential for viewing large datasets • Cushion techniques effective for visualizing various kinds of visual nesting (syntax,symbol,file,…) • Working to extend & generalize the VCN • What to do when M,N exceed a few hundred? Check it out: www.win.tue.nl/~lvoinea/VCN.html

F. Boerboom, A. Janssen, G. Lommerse, F. Nossin, L. Voinea, A. Telea

F. Boerboom, A. Janssen, G. Lommerse, F. Nossin, L. Voinea, A. Telea

Presentation Transcript

S a r a G a g l I a n o – A l f a r o Artist

F L A G S

G r a f f i t i

F L A P

c a m o u f l a g e

F A S A L

T R A F F I C L I G H T S

P A F A N G

G-F-F

w w w . f f f c a l i . o r g

F A L L R A L L Y

|G |F |G |F | da ah… oh oh… G F

?- a(b,c(d,e(f,g))) = a(b,c(d,e(f,g))). ?- a(b,c(d,e(f,g))) = a(b,c(d,e(g,f))).

A G F E D

F g