180 likes | 405 Views
Documentation Generators: Internals of Doxygen. John Tully. Motivation: Why document code?. Required by company / administrators who examine code Typically in software engineering, code isn’t consistently worked on
E N D
Motivation: Why document code? • Required by company / administrators who examine code • Typically in software engineering, code isn’t consistently worked on • Co-workers (or “smart” users) look at source code for a better understanding of functionality • Code modified/built upon by others
Motivation: Why Automated? • Docs much more likely to be up to date - developers only need to update comments in code (editing word docs/latex files isn’t enjoyable) • Reuse of comments = ½ the work • Auto formatting • More importantly, automatic cross-referencing if the automated system is sophisticated • Huge advantage: what if recipients want different formats? • Management team (or dumb readers) • End users • Software testers or other software developers
Problem / Goals • Create a simple, easy-to-use, portable, highly configurable automatic documentation generator for a variety of output formats • Real motivation: Creator (Dimitri Van Heesch) used Qt, an application development framework (popular for creating GUIs) • Wrote docs by hand to look exactly like Qt documentation; tried doc++ to do this automatically (not configurable enough) source: sourceforge
Other Doc Generators • Several generators support more languages than Doxygen * • ROBODoc, TwinText, Natural Docs • VB, Pascal, .NET, Perl, JavaScript, SQL • Many (including Doxygen) allow addition of new languages • Discussion – ease use of these tools? • As far as output formats, Doxygen most versatile • Also seems to be best for diagram generation • Dependency graphs, inheritance / collaboration diagrams • Certain tools may do some things that Doxygen can’t (control flow, data flow); but they’re language-specific * Source for info on other generators: Wikipedia
Doxygen Information Flow external docs HTML Pages Doxywizard GUI Doxytag Program Source Code Man Pages Doxyfile (Config file) Tag file RTF Files MS-Word formatted Doxygen • Custom Stuff: • - Headers • Footers • Images Latex Files ps or pdf XML dox xml parser (for custom output) Source: Doxygen “Getting Started” Page
1. Documentation external docs HTML Pages Doxywizard GUI Doxytag Program Source Code Man Pages Doxyfile (Config file) Tag file RTF Files MS-Word formatted Doxygen • Custom Stuff: • - Headers • Footers • Images Latex Files ps or pdf XML dox xml parser (for custom output)
2. Configuration external docs HTML Pages Doxywizard GUI Doxytag Program Source Code Man Pages Doxyfile (Config file) Tag file RTF Files MS-Word formatted Doxygen • Custom Stuff: • - Headers • Footers • Images Latex Files ps or pdf XML dox xml parser (for custom output)
3. Parsing / Execution external docs HTML Pages Doxywizard GUI Doxytag Program Source Code Man Pages Doxyfile (Config file) Tag file RTF Files MS-Word formatted Doxygen • Custom Stuff: • - Headers • Footers • Images Latex Files ps or pdf XML dox xml parser (for custom output)
4. Output Options external docs HTML Pages Doxywizard GUI Doxytag Program Source Code Man Pages Doxyfile (Config file) Tag file RTF Files MS-Word formatted Doxygen • Custom Stuff: • - Headers • Footers • Images Latex Files ps or pdf XML dox xml parser (for custom output)
Doxygen Internals • Doxygen Information Flow: high-level view of steps taken by the user to document code • Doxygen Internals: overview of data flow as source files are parsed by Doxygen • Closer look at Parsing / Execution Phase mentioned previously
Doxygen Internals: Data Flow Config Parser Documentation Parser C Preprocessor Language Parser Data Organizer Output Generators Source: Doxygen“internals” page Source Parser • Config parser: processes configuration file • Written in flex • Simple (since config options are pretty simple – just 5 types) • C Preprocessor: fairly similar to standard preprocessor • Written in flex; uses yacc-based parser for expression evaluation • Language parser: convert input buffer into a tree of entries • Basically breaks up code into smaller modules that will be reorganized later
Doxygen Internals: Data Flow Config Parser Documentation Parser • Data organizer: Gather information from these entries • Build dictionaries of classes, files, variables, functions, groups, etc. • Determine relationships between these entities • Documentation parser: find comment blocks / tags in entities; feed results to output generators • Source parser (if source is documented): cross-referencing, syntax highlighting • Output generators: take data, which was found by language parser and organized/cross-referenced by data organizer, and generate output in specified format C Preprocessor Language Parser Data Organizer Output Generators Source Parser
Debugging Doxygen • For a thorough understanding of doxygen’s source code, understanding of flex important • Lexical analyzer: generates scanners (programs that recognize lexical patterns in text) • Executable produced from scanners, which execute C code when patterns are found • Flex has debugging option to output matched rules when they’re found – easy to follow the steps doxygen is taking • On internals page, a tool (script) is available which turns on debugging for any flex file
Limitations / Future Work • Limitations / “wish list” of doxygen discussed during demo • More languages, more output formats, better template files • Several improvements to internals also mentioned: • Language Parser: • One scanner / parser per language (currently one huge combined scanner) • Modulate parsing (documentation blocks vs code) • Parse preprocessor input for extended documentation (i.e. #defines)
Limitations / Future Work • Output generators – interesting future work • Instead of using data structures (generated from the data organizer) to produce XML, use XML as an intermediate language • More info could be extracted by various output generators – with more understandable IL, easy to create better end tools • Interactive source browsers • Configurable class diagram generators • Computing code metrics