220 likes | 359 Views
CDT DOM Proposal. Doug Schaefer IBM Tech Lead, Eclipse CDT. December 2004. Title slide. Where are we (IBM) coming from?. We’re part of the Model-Driven Development (MDD) team in the Rational Software Division of the IBM Software Group
E N D
CDT DOM Proposal Doug SchaeferIBM Tech Lead, Eclipse CDT December 2004 Title slide
Where are we (IBM) coming from? • We’re part of the Model-Driven Development (MDD) team in the Rational Software Division of the IBM Software Group • CDT is the core C++ component of Rational Software Architect • The C++ component includes visualization of code as UML as well as transformation of UML to C++ • We need an accurate, complete DOM that allows programmatic code change • Since do both Java and C++ we’d like the architectures of the JDT and CDT to be “similar”.
What have we done until now? • Started work on parser in Dec. 2002 • Considered a number of options but settled on a handwritten parser • Needed smart control over ambiguities in C/C++ • Needed to support a number of clients with different needs • Similar in approach to newer versions of gcc. • First contributed in 1.1 for CModel replacing previous JavaCC-based parser • Added Indexing and Search as client in 1.2 • Replaced previous ctags-based index (not without issue) • Numerous clients in 2.0, including content assist, F3, type browsers
Parsing Architecture 1.0 • We had concern over scalability of AST • Settled on a callback-based architecture • Clients created only the data that they needed • Client would pass in a requestor to the parser and the parser would pass in parse info as the parse progressed • Despite the names, the IAST* parse info does not form a proper AST • Client would also be responsible for passing in ScannerInfo • Compiler command line arguments that affect parsing • Include paths, macro defs • Needed to properly parse
Parsing Architecture 1.0 – What worked well • Pretty accurate parse information (usually) • Content assist worked really well for C and C++ • Accurate out line view • Accurate search results, F3 • Ability to generate type info for class and hierarchy browsers • Flexibility for clients • Enabled us to pile on the clients in 2.0
Parser Architecture 1.0 – What didn’t work well • Performance • Assumed parse times would be quick, < 1 sec • Finding parse times usually in 2-4 second range • Index takes a lot of time, memory, and CPU power to generate • Content assist times out regularly • Hard to provide accurate results consistently • F3 and content assist often produce no results • Need for accurate ScannerInfo often hard to satisfy • Scalability to large projects • Problems only worsen when project grows • Index times longer than already long build times • ScannerInfo different for different files in project
Why a DOM? • Although flexible, the callback mechanism was hard to define • Unable to provide all information that every possible client would require • Often unable to determine parser context for certain constructs • Clients had added complexity to manage their own data • Parser had added complexity to provide enough data • Parse mode proliferation • Need to address scalability • Cut down on the amount of parsing we need to do • Can we reuse parse results? • Need data structures to help programmatically make changes to the source code
DOM Architecture • The DOM is composed of the parser/scanner and three levels of data • 1) Physical AST with mapping back through macro expansions to source • 2) Logical Scope/Binding tree with cross translation unit indexing • 3) AST Rewriter • Firm up interfaces for DOM creation • Navigate from Core Model to DOM • Hide parser from clients (use core model) • Allows us to play with how the DOM is created to improve scalability
Goals for 3.0 • To reduce the fixed cost of a parse (preprocess, scan & syntax matching) and to allow for lazy semantic evaluation • Improve performance & reduce memory footprint of navigation features • To provide a “complete” physical AST which can make our clients aware of preprocessor macro expansions in source code • To provide better support for C • Link-time resolution cross references • Tailored implementation of parser/semantic bindings
Physical AST - IASTNode • Given a file to parse, the AST Framework shall return an IASTTranslationUnit which then can be traversed or visited • The IASTNode interface hierarchy represent constructs in the C/C++ grammar. • long x; /* IASTSimpleDeclaration with 1 IASTDeclarator */ • long y(); /* IASTSimpleDeclaration with 1 IASTFunctionDeclarator */ • int f() { return sizeof( int ); } /* IASTFunctionDefinition */ • Physical tree is unaware of any semantic knowledge • Declaration before use (C++) • Scoping • Type compatibility • lValue vs. rValue • Allows for quick generation of syntax tree • Only slight overhead to cost of scan & preprocess
Logical Tree - IBinding • Logical elements are higher-level constructs that map onto a physical IASTNode • For 3.0 : IASTName#resolveBinding() • long x; /* IASTName for x resolves to an IVariable */ • long y(); /* IASTName for y resolves to an IFunction */ • int f() { return sizeof( int ); } /* IASTName for f resolves to an IFunction*/ • Beyond 3.0 – IASTBinaryExpression could bind to an user defined operator • Semantic errors return an IProblemBinding describing the error • IASTTranslationUnit can be asked for all declarations or references of a particular IBinding • Bindings can be resolved completely lazily on request or in a full traversal of the AST • Indexer would require full binding resolution • Most other clients do not require all bindings to be resolved
Macro Awareness in the Physical AST • Nearly all parser clients in the CDT are concerned with offsets • Selection • Search Markers • Navigation • Compare • Within a preprocessor macro expansion, our IScanner implementation massages the offsets as tokens arrive • However, the CDT 2.x AST Nodes are unaware as to whether or not they are a result of a macro expansion • This deficiency affects different clients differently.
Introducing IASTNodeLocation • Every node in the AST derives from IASTNode • Every IASTNode provides a mechanism for resolving its location in the physical AST • External Header Files • Resources • Working Copies • Macro Expansions • getNodeLocations() returns an array of IASTNodeLocation, as a node may span multiple locations (when macros are involved) • Interpretation of macro expansion locations are up to the client
Better support for C • Support for link-time bindings • resolving function references in C • Will greatly aid navigation/search features for this style of C code • Refactoring/programmatic edit support will be difficult • Definite candidate for “Preview” pane in refactoring • Without full build-model to reference, resolution is heuristic-based • Stricter emphasis upon providing custom implementation for C & GCC • AST & supporting data structures will be more compact memory wise • Syntax parsing will be more accurate • Algorithms for semantic analysis in C are far less rigorous • Should yield better performance for nearly all clients
Parser Language Variants • Interfaces • org.eclipse.cdt.core.dom.ast – Base interfaces that apply to both ANSI C & ISO C++ • org.eclipse.cdt.core.dom.ast.c – C99 specific sub-interfaces • org.eclipse.cdt.core.dom.ast.cpp – ISO C++ 98 specific sub-interfaces • org.eclipse.cdt.core.dom.ast.gnu – GNU extensions that apply to both C & C++ • org.eclipse.cdt.core.dom.ast.gnu.c – GNU extensions that apply to C • org.eclipse.cdt.core.dom.ast.gnu.cpp – GNU extensions that apply to C++ • Implementations • org.eclipse.cdt.internal.core.parser2 – supporting infrastructure • org.eclipse.cdt.internal.core.parser2.c – C99 & GCC support • org.eclipse.cdt.internal.core.parser2.cpp – ISO C++ 98 & G++ support • Other variants may subclass C or C++ Source Code Parser and choose which GNU extensions they wish to enable through a configuration interface
Physical Tree Hierarchy • C++ extends the common AST • GNU C++ extends the C++ • C++ and C are unrelated outside of the common AST package
Logical Tree Hierarchy • C extends common • C++ extends common • Theoretically, new bindings could be added for future variants • e.g. GCC Signatures
AST Rewriting • AST Rewriter accepts requests to change AST • Add (Insert/Set), Remove, Replace • The new code can be a new AST Node or a String • AST Rewrite gathers change requests and then executes • Analyzes the change requests for validity • Produces text edits that can be applied to documents to affect the change • The analysis can decide not to do a change because it is too hard • E.g. when macros are involved
DOM as a Service • We need to take some effort to minimize parsing within Eclipse • C and C++ are mature languages : hence, larger source code bases • Multiple clients parsing on resource-change events can cripple the system • A complex web of #include’s throughout the code base is difficult to optimize per-parse without having knowledge of previous parses • Same with templates … • The Indexer is already parsing continually, we should be able to leverage that information for all other clients that require saved-file parse trees • Since parsing can be a processor and memory-intensive operation, it is difficult for the indexer to co-ordinate its priority vs. other parser clients for system resources • users requests an Open Declaration which competes against a running indexer for memory & CPU • Eventual Goal • AST Service w/Index • Incremental Parse