380 likes | 519 Views
WhereTheHeck. A Web Based Source Code Navigator. Dario Menasce, Stefano Magni, Francesco Prelz, Luciano Barone, Luca Dell'Agnello, Emanuele Leonardi. Motivations for the project. Large collaborations in HEP have vast amounts of code
E N D
WhereTheHeck A Web Based Source Code Navigator Dario Menasce, Stefano Magni, Francesco Prelz, Luciano Barone, Luca Dell'Agnello, Emanuele Leonardi CHEP2000 - Dario Menasce, I.N.F.N. Milano
Motivations for the project • Large collaborations in HEP have vast amounts of code • maintained by developers scattered around the globe • Tools exist to use a WEB browser to navigate (hyperlinks) • through the code from remote locations (Light, LXR) but... none of them provides sophisticated search functions to locate specific instances of language tokens (variable names, class names, common blocks etc…) • Light covers FORTRAN and C++ but has no search function • LXR has a search function, albeit limited to literal search; no • regular expression support is provided. It only works for C CHEP2000 - Dario Menasce, I.N.F.N. Milano
This unsettled situation prompted us to try a different approach to the problem of providing remote access to source code: • we focus more on the search capabilities of the source code navigator • rather than it’s hyperlink connectivity Powerful language parsers under the hood • C and particularly C++ dominate current software efforts, but vast • amounts of FORTRAN code still linger around (legacy code) We aim to provide search functionality for all three languages • Portability is an important issue, together with scalability The navigator is based on PERL and JavaScript (open software) CHEP2000 - Dario Menasce, I.N.F.N. Milano
Experiment’s official source code repository Users Users Why a WEB based source code navigator? A typical scenario in HEP • access to repository usually requires remote login Not always feasible • users need to be knowledgeable of whereabouts • such as location and structure of the repository Need practice and a map • users often need to locate the occurrences of a • particular token (e.g.: a variable) within the entire • name-space of a reconstruction or simulation code Hyperlinks are insufficient WEB browsers are an optimal choice • need an easy to use control panel as a navigation • steering-wheel CHEP2000 - Dario Menasce, I.N.F.N. Milano
General philosophy and guidelines for the WhereTheHeck project • the only navigation interface required is a WEB browser Not reinvent the wheel • languages supported are FORTRAN, C and C++ Provision for others to come Scales well and ensures portability • the package is entirely written in PERL and JavaScript • the parsers are: f2c for FORTRAN and • gcc (egcs) for C and C++ Piggy-back parser technology expertise • input of the navigation package is a directory tree with not yet preprocessed • source code files (no provision to directly access any code management format) • output of the navigation package is a set of HTML web pages created on the fly No static HTML files exist every information is thus up-to-date by definition • output is produced very very fast (there is a specialized database under the hood) CHEP2000 - Dario Menasce, I.N.F.N. Milano
The project has four major components A set of scripts capable of extracting the list of tokens from a source code and their location (file, line, column, type etc…) A parser Tokens extracted by the parser are stored in a database for fast access (slow for updates, extremely fast for retrieval) A tokens database A control panel (query forms) and an output panel: HTML pages are created on the fly by user demand in real time A WEB interface A UNIX tar file containing the whole WhereTheHeck source code. Installation and configuration scripts take care of the customization process on the user local machine (usually the official experiment’s repository computer with a WEB server) An installation package CHEP2000 - Dario Menasce, I.N.F.N. Milano
The structure of WhereTheHeck Experiment’s source code official repository /root /subdir1 /subdir2 …... • Token list • token name • token type (variable, class…) • token qualifier (reference, modified location…) • token position (file,line, column) The parser The database manager The CGI search engine (PERL) The browser’s client (Netscape) • The database files • linked lists • extremely fast access during read • reasonably slow during creation The HTML formatter Off-line processing on remote repository done (at the Lab) Dynamically created HTML page CHEP2000 - Dario Menasce, I.N.F.N. Milano
1238 FindTrack.f The parser f2c for FORTRAN To avoid reinventing the wheel, we used as parsers our customized version of two public domain compilers of wide spread use: gcc for C and C++ For both of them we modified those part which perform the lexical analysis of the source code: compiling code with them gives, as a by-product, the full list of tokens for each language. Token name: Px ... If ( Px .Gt. 100.) Then Endif ... f2c Source file: /disk1/menasce/analysis/FindTrack.f Line number: 1238 Start column: 12 End column: 13 Token type: real variable Token qualifier: referenced Database of all tokens Source code repository CHEP2000 - Dario Menasce, I.N.F.N. Milano
FORTRANhas a rather simple syntax and is correspondingly easy to parse (so to speak…) C, on the other hand, is much more complex and C++is a real nightmare. This has important consequences: • The FORTRAN parser is already almost complete and working • For C we are on the way of completion (for a limited set of syntactic elements) • For C++ we customized the parser for a very limited set of tokens (classes, methods) There is a rather long list of technical difficulties which hampered a straightforward use of the f2c and gcc parsers • The lexer part of those parsers is not documented anywhere • We wanted a customization in the form of a patch, so that later versions of the official • compiler could be easily patched to provide the functionality we need • The code developers work with is usually filled with precompiler directives (#define ...). • As a result, token location within the code found by a parser (which works on files already • preprocessed ) does not match the original position within the source code: CHEP2000 - Dario Menasce, I.N.F.N. Milano
Macro definition In the original source code the zeta variable is placed in position 24 / 27 Now zeta appears at location 41 / 44. This is what the compiler sees as a source code Original source code snippet #defineF(x,y,z) (((x)&(y)) | ((~x) &(z))) ……… Var = F(a,beta,c) + zeta #define F(x,y,z) (((x)&(y)) | ((~x) &(z))) ……… Var = F(a,beta,c) + zeta 123456789-123456789-123456789-123456789-12345 After precompiler expansion has occurred ……… ……… Var = (((a)&(beta)) | ((~a) &(c))) + zeta 123456789-123456789-123456789-123456789-12345 To solve the problem, we developed an inverse precompiler (uncpp) Given the columns of a token as determined by the compiler, uncpp recovers the original columns within the source code: these quantities are then stored in the tokens database. CHEP2000 - Dario Menasce, I.N.F.N. Milano
The tokens database We designed an ad hoc database, implemented as a PERL module • Basically its a multiply sortedlinked list, featuring a very fast retrieval time Takes less than a second to retrieve the location of a token among 4 million entries • Tokens can be searched for by specifying an arbitrary complex regular expression • (following PERL’s implementation of regexp) Regexp have a concise yet powerful syntax WEB Input form can be made extremely simple Short regexp conveys lots of information: accurate pinpoint of tokens CHEP2000 - Dario Menasce, I.N.F.N. Milano
The WEB interface and the search engine (CGI) The WEB interface consists of anHTMLpage containing aJavaScriptinput form • It is essentially an abstraction layer to the token database: Regardless of the language of the source code being browsed, the input form and the generated WEB output pages have always the same appearance Future extensions to additional languages have little effect on the infrastructure (even the database has an abstraction layer to the parser) • HTML output pages are created on-demand by means of CGI scripts (PERL) • Output consists of HTML formatted pages with the source code line numbered • color coded and the requested token highlighted in red • Only two types of token have associated hyperlinks • Subroutine and function references • Include files CHEP2000 - Dario Menasce, I.N.F.N. Milano
The WEB interface and the search engine (CGI) Users are sometimes interested in finding out where a particular pattern of characters is located even if it’s not part of the language (like a comment line) An input form accepts a regular expression by the user A match is then attempted for any source file in the specified directory tree original files are scanned, no token database is used! This option ensures full coverage of almost any possible request a user might have CHEP2000 - Dario Menasce, I.N.F.N. Milano
The installation and configuration package This entire tool is available via download from the WEB • To install and locally configure the system only two scripts are needed • INSTALL: takes care of locally compiling the customized versions of gcc, f2c and additional PERL modules) • CONFIGURE: provides the local configuration of the tool via a user driven menu: takes care of adapting the tool to the local WEB server and other associated tasks • To create the navigation structure for a particular project • a single script is needed: • wth.pl: menu driven: given the path of the directory tree containing the code to hyperlink, it generates the tokens database needed for WEB navigation and any other ancillary file Demo available online at http://almifo1e.mi.infn.it/W_Main.html Now, let’s see a working example CHEP2000 - Dario Menasce, I.N.F.N. Milano
This is the main entry point for the interface to project mcfast (a large simulation program). CHEP2000 - Dario Menasce, I.N.F.N. Milano
WEB browser multiframe pop-up window CHEP2000 - Dario Menasce, I.N.F.N. Milano
Find where a token containing the string zminor the string zmax is located in the whole source code of the project mcfast,but only in places where its value gets modified Use of a simple regular expression CHEP2000 - Dario Menasce, I.N.F.N. Milano
Even simple strings can be searched for, either as plain strings or as regular expressions CHEP2000 - Dario Menasce, I.N.F.N. Milano
22) /vtx28/winner/btev/mcfast/v2_6_2/mcfast/src/geom/load_beampipe.f CHEP2000 - Dario Menasce, I.N.F.N. Milano
Floating Integer…. CHEP2000 - Dario Menasce, I.N.F.N. Milano
Conclusions • We have devoleped a WEB based source code navigator using a novel approach • Focus is on search-find capabilities rather than hyperlinked navigation • FORTRANbrowsing capabilites already fully implemented • C on it’s way to completion. C++ with limited capabilites • The possible connectivities that can be implemented once a database of token • pointers has been made available are still all to be explored…. The basic infrastructure (parsers, database manager, search engine) is all in place at this very moment • This tool has been recently made available (beta-test on a best-effort) • to a limited set of experimental groups for evaluation • For future developments, particularly in the C++ sector, we definitely envisage • help from software professionals CHEP2000 - Dario Menasce, I.N.F.N. Milano