1 / 97

Overview and Evaluation of Java Component Search System SPARS-J

Overview and Evaluation of Java Component Search System SPARS-J. Reishi Yokomori **, Hideo Nishi**, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**, Shinji Kusumoto **Katsuro Inoue** *Japan Science and Technology Agency **Osaka University. Outline.

lark
Download Presentation

Overview and Evaluation of Java Component Search System SPARS-J

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview and Evaluation ofJava Component SearchSystem SPARS-J Reishi Yokomori **, Hideo Nishi**, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**, Shinji Kusumoto **Katsuro Inoue** *Japan Science and Technology Agency **Osaka University

  2. Outline • Motivation and research aim • SPARS-J • SPARS-J (Outline) • Ranking method • System architecture • Experimental evaluation for SPARS-J • Conclusion and Future work

  3. Motivation • A library of software is a fount of wisdom. • Reuse of software components improves productivity and quality. • Example of components: source code, document ….. • Maintenance activity is more easier with the library. • However, a collection of software is not utilized effectively. • A developer doesn’t know an existence of desirable components. • Although there are a lot of components, these components are not organized. • We need a system to manage components and to search suitable component.

  4. Research aim • We build a system which have functions as follows • searches component, which is suitable for user’s request • manages the component information • Targets • Intranet • Closed software development environment inside a company • Internet • Source code from a lot of open-source-software community • Source Forge, Jakarta Project. etc.

  5. Outline • Motivation and research aim • SPARS-J • SPARS-J (Outline) • Ranking method • System architecture • Experimental evaluation for SPARS-J • Conclusion and Future work

  6. SPARS-J(Software Product Archive,analysis and Retrieval System for Java) • SPARS-J is Java Source Code Search System • analyzes and extracts components automatically. • Component: a source code of class or interface • builds a database based on the analysis. • Use-Relation, Similar Components, Metrics, ..... • provides keyword-search. • Three ranking methods: KR, CR, KR+CR • Analysis information • Components using (used by) the component • Package hierarchy

  7. Keyword Rank (KR) Component Rank (CR) KR+CR Rank ( KR+CR) Ranking search results • Ranking method • Component used repeatedly (by important component) • Ranking based on use relation between components • Component suited to a user request • Frequency of word appearance (arranged TF-IDF) • A class-name, a method-name, ..., have special importance • Integrated Ranking • Components prized both in KR and CR are very important • Integration by Borda Count method

  8. User interface Component retrieval System architecture of SPARS-J (Building a Database) Library(Java source files) Database Component analysis store • Component Information • Indexes • Use-Relation • Clustered Component Graph • Component Rank • extracts components • indexes each appeared word • extracts use-relation • clustering similar components • calculates Component rank provide

  9. User Database • Component Information • Indexes • Use-Relation • Clustered Component Graph • Component Rank System architecture of SPARS-J (Searching Components) Component analysis Component retrieval • searches components • from Indexes • sorts components • by CR, KR, KR+CR Query User interface Query • analyzes query • Analysis condition • Keywords • displays search results • Additional Information • Source Code • Use Relation • Similar Components • Metrics • etc......... Components List Result Request Information

  10. Screenshot (Top page)

  11. Screenshot (Search results)

  12. Screenshot (Source code)

  13. Screenshot (Similar components)

  14. Screenshot (Using the component)

  15. Screenshot (Used by the component)

  16. Screenshot (Package browsing)

  17. Outline • Motivation and research aim • SPARS-J • SPARS-J (Outline) • Ranking method • System architecture • Experimental evaluation for SPARS-J • Conclusion and Future work

  18. Experimental Evaluation • Comparison of each ranking method in SPARS-J • We investigate the best ranking method • CR vs. KR vs. CR+KR • Comparison with other search engines • We verify SPARS-J’s effectiveness as a software component search engine. • vs. Google, Namazu • Application of SPARS-J in actual development environment • We confirm that SPARS-J is useful to management and understanding of software.

  19. Experiment 1: Comparisonof ranking method in SPARS-J • Purpose of Experiment • We investigate the best method among 3 ranking method in SPARS-J. • CR (Based on Use-relation) • KR (Based on TF-IDF) • CR+KR ( Integrating 1 & 2) • Preparation • Database from Java source codes publicly available • About 140,000 files from JDK, SourceForge, etc..... • Keywords • 10 queries assumed development of simple system

  20. Experiment 1: Comparisonof ranking method in SPARS-J • Criterion of Evaluation • Precision of components in the top 10 Result: • The percentage of suitable components • User tends to look at only a higher ranked results. • High precision means that there are many useful components in range of user’s visibility. • Ndpm: • The percentage of the component pair which differs rank order between two ranking methods. • We define user‘s ideal ranking in advance, and calculate ndpm. • The quantitative indicator which shows a distance from ideal • Ndpm considers all the components in a search result. • Its distance becomes large when required components are ranked low.

  21. Result (Experiment 1) Ndpm Precision

  22. Consideration (Experiment 1) • By Paired-Difference T-Test, we have confirmed that following difference are significant at the 5% level. • Precision: KR,CR+KR ≫ CR • Ndpm: CR,CR+KR ≫ KR • Characteristic of each method • CR • CR generally ranks components in desirable order. • Higher ranked components are important but often have no relevance to keyword. • KR • KR generally appreciates components which have strong relevance. • In required component, keyword doesn’t always appear with high frequency. • CR+KR • CR+KR has good result at both precision and ndpm. • CR+KR has the best of both ranking • We use CR+KR as a default ranking method.

  23. Experiment 2:Comparison with other search engines • Purpose of Experiment • We verify SPARS-J’s effectiveness as a software component search engine. • SPARS-J • Database from 140,000 files (Same as Experiment 1) • We use CR+KR as ranking method. • Google • Famous web search Engine • Input queries to www.google.co.jp • Namazu • Full-text search system for documents. • Namazu uses TF-IDF to rank documents. • Database from 140,000 files (Same files as SPARS-J) • Preparation • Keywords: 10 queries (Same as Experiment 1) • Criterion of Evaluation: Precision of the top 10 Result

  24. Result (Experiment 2) Precision of the top 10 result

  25. Consideration (Experiment 2) • By Paired-Difference T-Test, we have confirmed that following difference are significant at the 5% level. • Precision SPARS-J≫ Namazu ≫ Google (*) SPARS-J (CR, KR, CR+KR) ≫ Namazu • Consideration of Results • Google • In the result, there are many pages other than an explanation of Java source code. • Performance depends on how much description there are. • Namazu • Since the datasets consists of only source codes, the result is better than Google. • Without characteristics of Java programs, we cannot get good results. • For searching software components, SPARS-J is more useful than other search engines.

  26. Experiment 3: Application of SPARS-J in actual development environment • Purpose of Experiment • We confirm that SPARS-J is useful to management and understanding of software resource. • Criterion of Evaluation • Qualitative evaluation about SPARS-J • Preparation • We set up SPARS-J to a company. • 7 employees use SPARS-J for two weeks. • They are all engaged in the software development and the maintenance activity. • We carry out a questionnaire survey about SPARS-J

  27. Result (Experiment 3) ( [Useful or Used repeatedly] 5 4 3 2 1 [Useless or seldom Used] )

  28. Consideration (Experiment 3) • Highly rated questionnaire items • Reference by package browser • Reference by similar components • Reference by components using (used by) the class • View-ability of the component list view and source code • Activities realized by using SPARS-J • Listing of applications which uses certain component • Impact analysis at reediting components

  29. Consideration (Experiment 3) • Other comment • Response speed is very quick, and we have felt no stress. • Since it is not necessary to install in a client, sharing of software components is easy. • SPARS-J can support maintenance work effectively. • Easier grasp of software components

  30. Conclusion and Future works Conclusion • We construct software component search system SPARS-J. • Search engine for Java source code • Ranking components with consideration of characteristics. • Provision of useful relevant information. • We verified the validity of SPARS-J based on experimental evaluation. • SPARS-J is useful to search software components. • SPARS-J is very helpful to grasp and manage components. Future works • The quantitative evaluation other than ranking performance • Support for other software component

  31. Outline • Motivation and research aim • SPARS-J • Outline • System architecture • Ranking method • Each part • Analysis part • Retrieval part • User Interface • Experiment • Conclusion and Future work

  32. Component analysis part • Extract component and its information from a Java source file • The process • Extract a component • Index the component • Extract use relations • Clustering similar components • Rank components based on use relations (CR method)

  33. Extract and index a component • Extracting component • Find class or interface block in a java source file • Location information in the file (start line number, end line number) • Indexing • Extract index key from the component • Index key: a word and the kind of it • No reserved words are extracted • Count frequency in use of the word public final class Sort { /*quicksort*/ private static void quicksort(…) { int pivot; : quicksort(…); quicksort(…); } } Index key frequency

  34. Extract use relations • Extract use relations among components using semantic analysis • Make component graph from use relations • Node: component • Edge: use relation Data public class Test extend Data{ : public static void main(…) { : Sort.quicksort(super.array); : } } Inheritance Field access Sort Test Method call The kind of use relation Component graph

  35. C G C G B F BF A D E E AD Similar component • Similar component is copied component or minor modified component • We merge similar components into single component • Merged component have use relations that all component before merging have C G B F A D E Component graph Clustered component graph

  36. Clustering components • We measure characteristics metrics to merge components • The difference ratio of each component metrics • Metrics • complexity • The number of methods, cyclomatic, etc. • represent a structural characteristic • Token-composition • The number of appearances of each token • represent a surface characteristic

  37. Ranking based on use relation • Component Rank (CR) • Reusable component have many use relation • The example of use is much • General purpose component • Sophisticated component • We measure use relation quantitatively, and rank components • The component used by many components is important • The component used by important component is also important Katsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto: "Component Rank: Relative Significance Rank for Software Component Search", ICSE, Portland, OR, May 6, 2003.

  38. 0.34 0.33 0.17 0.17 0.33 0.33 0.33 Propagating weights A B C Ad-hoc weights are assigned to each node

  39. 0.33 0.17 0.175 0.175 0.5 0.17 0.5 Propagating weights A B C The node weights are re-defined by the incoming edge weights

  40. 0.25 0.25 0.345 0.175 Propagating weights 0.5 0.175 A B 0.345 C We get new node weights

  41. Propagating weights 0.4 0.2 0.2 A B 0.2 0.4 0.2 0.4 C • We get stable weight assignment • next-step weights are the same as previous ones • Component Rank : order of nodes sorted by the weight

  42. Outline • Motivation and research aim • SPARS-J • Outline • System architecture • Ranking method • Each part • Analysis part • Retrieval part • User Interface • Experiment • Conclusion and Future work

  43. Component retrieval part • Search components from database, rank components • The process • Search components • Ranking suited to a user request • Aggregate two ranks (CR and KR)

  44. Search components • Search query • Words a user input • The kind of an index word, package name • Components contain given query are searched from Database

  45. Ranking suited to a user request • Keyword Rank (KR) • Components which contain words given by a user are searched • Rank components using the value calculated from index word weight • Index word weight • Many frequency in use of a component • A word contained particular components • A word represent the component function such as Class name • Sort the sum of all given word weight • TF-IDF weighting using full-text search engine

  46. Calculation of KR value • Calculate weight Wct with component c word t • TFi: The frequency with which a kind i of word t occurs in component c • IDF: the total number of components / the number of components containing word t • kwi: Weight of a kind i • KR value is the sum of all word Wct

  47. Aggregate two ranks • Aggregate two ranks KR and CR • Aggregation method • Borda Count method known a voting system • Use for single or multiple-seat elections • This form of voting is extremely popular in determining awards • SPARS-J • Rank components both KR and CR • Using KR and CR, the component that be suitable user’s request, reusable and sophisticated

More Related