210 likes | 240 Views
XSnippet: Mining For Sample Code. Naiyana Tansalarak and Kajal Claypool Presented by: Shan Li CISC864. Topics. Overall of Research Purposes Contributions Approaches Detail in Approaches. Overall of Research. Purposes: To provide sample codes for new developers to learn tech. quickly
E N D
XSnippet: Mining For Sample Code Naiyana Tansalarak and Kajal Claypool Presented by: Shan Li CISC864
Topics • Overall of Research • Purposes • Contributions • Approaches • Detail in Approaches
Overall of Research • Purposes: • To provide sample codes for new developers to learn tech. quickly • Approaches • Mining sample codes from existing software systems
Overall of Research cont. • Steps in Approaches • Range of Queries generalized / specialized? • Ranking Heuristics for context-sensitive / context-independence • Such as: constructor function / constructor function of DOM • Mining Algorithms • BFSMINE Alg. , restricts: inside a scope of a method • Extensions to BFSMINE Alg.
Approaches: the Snippet Mining Processes Figure1: A high-level view of the snippet Mining Process
Approaches cont. • The goal of the snippet mining is to mine from a given code sample repository all code snippets that satisfy a given user query Q, • SelectionAgent pre-selects a set of code model instances cmi on B+ tree index defined on all types declared or referred to in the code sample repository. • The MiningAgent invokes the BFSMINE algorithm for every code model instance
Approaches cont. • BFSMINE algorithm traverses a code model instance and produces as output a set of paths P that represent the final code snippets returned to the user. • On completion of the BFSMINE phase, the MiningAgent passes the collection of the paths P, to the PruningAgent.
Approaches cont. • Queries • The query retun all snippets s, containing codes that instantiate a type tq: • (1) all codes that instantiate tq: • (2) instantiation of tq is dependent of the code context, i.e. via a static method • The following example
Approaches cont. • Queries • A type-based instantiation query • tq is instantiated from any type from the context CT(m) • T (s) the lexically visible types in the code snippet s and CT (m) denotes the type context of the method • CT (m) : all set of inherited types, visible types in a scope of method, all types for local fields
Approaches cont. • Queries • Parent-based in instantiation query • s denotes a snippet, CP (s) the parent context of the snippet, CP (m) the parent context of the method m. • CP (m): The parent context of a method m, denoted as CP (m), is a set containing the superclass extended by its containing source class C, as well as all interfaces implemented by its containing source class C.
Approaches cont. • Source Code Model • A graphic representation of the structure of source codes. Nodes: a type node, an object node, a method node • Edges: inheritance, implement, composite, method, assignment or parameter edge.
Approaches cont. • BFSMINE Algorithm • Given a user query , The goal of the BFSMINE algorithm is to determine for all such instances nq, types and eventually code segments that instantiate the node nq and hence the query type tq. Domain(nq) = {tq}
Approaches cont. Extension-BFSMIN
Approaches cont. Extension-BFSMIN
Personal Comments • Strengths • User defined queries • Results from a context-independent retrieval to various degrees of context-sensitive retrieval • BFSMIN Algorithm based on a graph that represents a source code model allows mining across method boundaries • Ranking heuristic (length, frequency, context ) for providing best-fit code snippets • Multiple sample codes with the same query • context-independent retrieval (length / frequency ) • context-sensitive retrieval (context)
Personal Comments • Potential weakness • Results • Is it possible to provide semantic ranking ? • Why? Probably, the return code snippets do not have logic among them, just only a chunk of codes • Validation approaches • To prove that snippet codes is helpful for developers, authors use group test. Two groups with the same condition except that one uses snippet codes, other do not. • Limited ?