330 likes | 425 Views
Source Code Exploration with Google. Wayne State University. Denys Poshyvanyk, Maksym Petrenko, Andrian Marcus, Xinrong Xie, Dapeng Liu Presented by: Roli Shrivastava. HISTORY. Global Regular Expression Print (G/RE/P ) Existing Integrated Development Environments (IDE) File Searches
E N D
Source Code Exploration with Google • Wayne State University Denys Poshyvanyk, Maksym Petrenko, Andrian Marcus, Xinrong Xie, Dapeng Liu Presented by: Roli Shrivastava
HISTORY • Global Regular Expression Print (G/RE/P ) • Existing Integrated Development Environments (IDE) File Searches • Both are based on Regular Expression Matching Limitations of GREP and IDEs • Supports only specific development or maintenance task • Not in the mainstream of the software development practice. • Case sensitive • Limited interaction with potential users
MOTIVATION • To understand large & new parts of the Software systems. • People search codes for: • Concept location in source code • Impact analysis • Change propagation • Debugging • Comprehension of software in general • Hence to support them, we needed a fast and accurate tools and techniques.
PROPOSAL OF PAPER • New approach to Source Code Exploration • Integration of Google Desktop Search + IBM’s Eclipse Development Environment. • Known as Google Eclipse Search (GES)
EXISTING APPROACH • Searching based on Information Retrieval (IR) Indexing technique • IR allows formulation of queries with multiple words • More popular than regular expression matching Problems: • Computational Efficiency • Online-Re-indexing of the software
GES • Allows you to search software projects in a manner similar to searching the internet or their own desktops. • Searching Within Projects / working set of files • Uses Natural Language Queries • GES has advantages of GDS + Eclipse’s Extensibility. • GES based on IR indexing technique • Idea is also integrated with MS Visual Studio • Uses GDS to index and search source code files and project files • Is efficient as GDS • Re-indexing as the search space changes
LIMITATIONS OF GDS!! • GDS is not project specific search • Searches files in the entire system • Needs an internet browser • Awkward !! • User has to switch between IDE and the browser Solution is definitelyGES
GDS + ECLIPSE !!! • On-the-Fly preprocessing and indexing of the context • Continual indexing • maintains and updates current location changes • Accurate results • Immediate response for queries • History of searches • Advanced Search Options • Project specific search • Sorting of the results • Relevance • Dates
ADVANTAGES • features specific to IR-based searching • multiple term queries • natural language queries • Boolean operators • ranking of search results • scalability & high reliability of the proven search engine (i.e., GDS) • important for massive file • repositories, such as large scale software systems • display of and access to the search results within Eclipse’s IDE • its native interfaces that provide direct links between the search results and the actual • source code in the editor.
SYSTEM REQUIREMENT • To run GES, you will need: • Eclipse SDK 3.2 or higher; • Google Desktop Search (GDK) 2.0 or higher; • Java Run-Time Environment (JRE) 1.5 or higher.
GES DESIGN & IMPLEMENTATION • GES similar to File Search in Eclipse. • Type a Query into the GES dialogue Box. • Specify the Scope of the search • workspace • selected resources • enclosing projects • working sets • After the query, the search is displayed in GES search Results Tab. • Results can be explored by browsing in the editor.
PILOT CASE STUDY • Performed on Violet (http://www.horstmann.com/violet/) • Violet is a Cross Platform UML Editor written in JAVA • Has 65 classes + 448 methods + 9000 LOC Approach: • To request for a new feature • GOAL: “introduce a user-defined arrow type for the class diagram”.
QUERIES FOR PCS-I • Q2 : “arrow class diagram” OOPS… Did not return any matches • Q3: “edge class diagrams” Worked
RESULTS • 11 files as search results • UseCaseDiagramGraph • StateDiagramGraph • SequenceDiagramGraph • StateTransitionEdge • ObjectDiagramGraph • NoteNode • ObjectNode • FieldNode • ImplicitParameterNode • ClassDiagramGraph • CallNode.
ANALYSIS OF RESULTS • ClassDiagramGraph had the relevant result. To verify this finding: • ‘draw’ and ‘getPath’ methods in ‘ArrowHead’ are modified. • Related methods in ArrowHeadEditor file are also modified successfully.
GES vs. FILE SEARCH Problem : “concept location task” in violet Goal : “to locate the place in the source code which specifies the width of the class diagrams” File : “value saved in DEFAULT_WIDTH variable”
GES BEHAVIOR • Q1: “default width” “Bingo” in the first step itself…!!
FILE SEARCH BEHAVIOR Q1: “default width” “OOPSS !!! No results” Q2:”default” “yes …. Hmmm closer” Q3: “width” “yes… Much Closer”
FILE SEARCH can be made BETTER?? • In this particular case … • “Default *Width” would have worked fine. • Gave same result as GES in the 1st attempt Drawback: • To construct such expressions, • programmer should have additional information about identifiers • Unusable to construct such complex expressions all the time (this was a relatively simpler expression) • What will happen if the expression was more complex ?? !!!
FILE SEARCH vs. GES RESULTS • File Search had to be modified to get to the result • Narrow down the result by performing the search within the query • GES gave results in the first query itself. • GES is faster than File Search. • GES investigates less LOCs. • GES returns the ranked list of results. • Developers learn relevant information faster than File Search.
STILL NOT SURE !! • Authors say “This study has a proof-of-concept role, we do not generalize these conclusions”. • Need more detailed case study to extend the results.
OTHER CASE STUDIES • Needed a bigger project than “violet” • Queries were run on • P4 2.8Ghz with 1GB of RAM • GES plug-in • File Search in Eclipse 3 • Art of illusion : 3D modeling studio • Written in JAVA • Has 442 classes , 20 interfaces, 100838 LOC • Eclipse Version 3.1 + complete sources • 20000 files • 2 million LOC
METHODOLOGY • 10 queries were run on each system • Average response time needed for GES and File Search
DERIVED RESULTS !!! • GES is more effective in terms of response time • GES scales up very well with the size of the search space
LIMITATIONS • GES uses GDS • GDS’s background indexing • Only when user’s computer is idle • User has to wait for the (re)-indexing of the file. • None of the GDS APIs handles this issue.
Q: Is this really an issue?? A: As this is 1-time step, it only affects the first search on a software system
CONCLUSION • Integrating GDS into Eclipse • Improves source code searching • Produce easier to adopt approach • GES allows to perform searches in • all the source code • Associated documentation • Faster than the file search • Queries do not take into account the format of the identifiers in the source code
RELATED WORKS • JIRiSS – an Eclipse plug-in for Source Code Exploration (Information Retrieval based Software Search for Java) http://mercury.cs.wayne.edu/~vip/publications/Poshyvanyk.ICPC.2006.JIRiSS.pdf JIRiSS includes other advanced features • automatically generated software vocabulary • advanced query formulation options • including spell-checking as well as fragment-based search. • Information Retrieval – A book by C. J. van RIJSBERGEN http://www.dcs.gla.ac.uk/Keith/Preface.html