1 / 16

Development of a Software Search Engine for the World Wide Web

Development of a Software Search Engine for the World Wide Web. Ken-ichi Matsumoto — 松本健一 Akito Monden — 門田暁人 Toshiyuki Kamei — 亀井俊之 Haruaki Tamada — 玉田春昭 Naoki Ohsugi — 大杉直樹 Software Engineering Laboratory Nara Institute of Science and Technology.

chesmu
Download Presentation

Development of a Software Search Engine for the World Wide Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Development of a Software Search Engine for the World Wide Web Ken-ichi Matsumoto — 松本健一 Akito Monden — 門田暁人 Toshiyuki Kamei — 亀井俊之 Haruaki Tamada — 玉田春昭 Naoki Ohsugi — 大杉直樹 Software Engineering Laboratory Nara Institute of Science and Technology

  2. Needs for Software Search from WWW Search for better implementations Search for examples Is there any other implementation for this function? What is a typical usage of this library component? WWW Developer Is there any useful library for my program? Search for unaware components

  3. Goal Construct a software search engine for developers: • Collects various resources related to software development from the WWW, e.g. source code, executables, Tips, developer’s “blogs”, etc. • Provides a flexible query interface • Provides a recommendation of useful resources. In this presentation, we focus our target on Java programs.

  4. System Architecture WWW Resource Summary Repository Analysis Retrieval Collection Interface Result Software ResourceRepository Query • - Pointers to resources (url) • Recommendations Users

  5. Three Major Features of Our Search Engine 1: Finding a typical usage of a component Software Search Engine Name of a component M A set of components that use M User 2: Finding a similar component An implementation of a component M Components that have similar functionality to M User 3: Get a recommendation An unfinished program M A set of components useful for M User

  6. Three Major Features of Our Search Engine 1: Finding a typical usage of a component Software Search Engine Name of a component M A set of components that use M User 2: Finding a similar component An implementation of a component M Components that have similar functionality to M User We employ Software Birthmark, Similarity Evaluation, and CollaborativeFiltering to implement these features. 3: Get a recommendation Unfinished program M A set of components useful for M User

  7. A set of characteristics of a program* Constant Values in Field Variables (CVFV birthmark) Sequence of Method Calls (SMC birthmark) Inheritance Structure (IS birthmark) Used Classes (UC birthmark) etc. Useful for detection of software theft (plagiarism) Also useful for detection of a set programs having similar functionality (UC birthmark and SMC birthmark) Software Birthmark p CVFV SMC IS UC CVFV(p) SMC(p) IS(p) UC(p) * H. Tamada, M. Nakamura, A. Monden, and K. Matsumoto, “Design and evaluation of birthmarks for detecting theft of Java programs,” In Proc. IASTED Int’l Conf. on Software Engineering, pp.569-575, Feb. 2004.

  8. UC birthmark is a set of used classes. Example of Software Birthmark for Java import java.util.Iterator; import java.lang.reflect.Array; public class ArrayIterator extends Object implements Iterator{ private Object array; private int index = 0; public ArrayIterator(Object array){ if(!Class.isArray(array.getClass())){ throw new IllegalArgumentException( “not array type”); } this.array = array; } public Object next(){ return Array.get(array, index++); } public boolean hasNext(){ return index < Array.getLength(array); } ... UC Birthmark of ArrayIterator java.lang.reflect.Array java.lang.Class java.lang.IllegalArgumentException java.lang.Object java.lang.String java.util.Iterator

  9. Similarity between Two Components • Similarity computation of UC birthmark i and jbased on correlation coefficient where U: A set of all classfiles Ru,i = # of classfiles used by i / |U| ì 0 (u does not use class i) = í 1 (u uses class i) î Other computations are also available, e.g. vector (cosine) similarity, adjusted cosine, etc.

  10. Example (1): Search for typical usages • Data source: rt.jar (9206 class files) • Search for typical usages of “java.util.BitSet”

  11. Example (2): Search for Similar Component • Data source: a part of bcel5.1 (100 class files) • Search for classfiles similar to “ArithmeticInstruction”

  12. Selecting preferred items F K F K Collaborative Filtering (CF) • Filtering: means selecting preferred items from a large collection of items. • Collaborative: means using the other users’ preferences to filter items. A B C D E F is good! K is cool! ? ? F G H I J K L M N O P Q R S T Using the other users’ preferences Large amount of items

  13. 5 (prefer) Estimate Similar User Similar User Dissimilar User Two Steps in CF • Evaluate similarities between target user and the other users. • Estimate the preference using the other users’ preferences for target item and their similarities. Item 2 Item 4 Item 5 Item 1 Item 3 5 (prefer) 5 (prefer) 1 (not prefer) 3 (even) ? (target) User A 5 (prefer) 5 (prefer) 1 (not prefer) 3 (even) 5 (prefer) User B 5 (prefer) 5 (prefer) 1 (not prefer) 5 (prefer) 5 (prefer) User C 1 (not prefer) 1 (not prefer) 3 (even) 5 (prefer) 1 (not prefer) User D

  14. 1 (useful) Estimate Similar Component Similar Component Dissimilar Component CF for Software Components • Evaluate similarities between target component and the other components based on UC birthmark. • Estimate the usefulness using the other components’ UC birthmark for target classfile and their similarities. Class 2 Class 4 Class 5 Class 1 Class 3 1 1 0 1 ? (target) Component A 1 0 0 1 1 Component B 1 1 0 0 1 Component C 0 0 1 0 0 Component D 0 … not used 1 … used

  15. Actually used Example (3): Get Recommendations • Data source: a part of bcel5.1 (100 class files) • Search for recommendation for “ArithmeticInstruction”

  16. Summary • Three features of a software search engine • Providing typical usage of a component • Providing a similar component • Making a recommendation • Three key technologies • Software Birthmark • Similarity Evaluation • Collaborative Filtering

More Related