170 likes | 262 Views
Multi-Abstraction Concern Localization. Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University. Motivation. Concern Localization Locating code units that match a text descriptions Text descriptions: bug reports or feature requests
E N D
Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University
Motivation • Concern Localization • Locating code units that match a text descriptions • Text descriptions: bug reports or feature requests • Code units: classes or methods’ source code • Documents are compared • Based on words (IR) or topics (topic modeling) that they contain compared at one level of abstraction i.e. word/topic level
Motivation • A word can be abstracted at multiple levels of abstraction. European Continent Level N Western Europe … Level 3 Netherlands Level 2 North Brabant Eindhoven Level 1
Multi-Abstraction Concern Localization compare Level N Level N … … Level 3 Level 3 Level 2 Level 2 Level 1 Level 1 Bug Report or Feature Request Source Code
Multi-Abstraction Concern Localization • Locating code units that match a textual descriptions • By comparing documents at multiple abstraction levels. • By leveraging multiple topic models • 3 main components • Text preprocessing • Hierarchy creation • Multi-abstraction retrieval technique
Concerns Method Corpus Overall framework Preprocessing Hierarchy Creation Level 1 Level 2 …. Level N Abstraction Hierarchy + Multi-Abstraction Retrieval Standard Retrieval Technique Ranked Methods Per Concern
Hierarchy Creation • We apply Latent Dirichlet Allocation (LDA) a number of times • LDA (with default setting) accepts • Number of topics K • A set of documents • LDA returns • K topics, each is a distribution of words • Probability of topic t to appear in document d
Hierarchy Creation • Each application of LDA creates a topic model with K topics • Assigned to a document • Corresponds to an abstraction level • Abstraction hierarchy of height L • Height = number of topic models • Created by L LDA applications
Multi-Abstraction Vector Space Model • Multi-Abstraction Vector Space Model (VSM) • Standard VSM + Abstraction Hierarchy • In standard Vector Space Model • Document is represented as a vector of weights • Each element corresponds to a word • Its value is the weight of the word • Term frequency-inverse document frequency (tf-idf)
Multi-Abstraction Vector Space Model • We extend document vectors • Added elements: • Topics of topic models in the abstraction hierarchy • Their values are the probabilities of the topics to appear in the documents • Example: • Document vector has length of 10 • Abs. hierarchy has 3 topic models of size 50,100,150 • Extended document vector is of size: 10+ (50+100+150) = 310
Experiments • Dataset: • 285 AspectJ faulty versions extracted from iBugs • Evaluation Metric: • Mean Average Precision (MAP)
Empirical Result • The MAP improvement of H4 is 19.36% • The MAP is improved when the height of the abstraction hierarchy is increased
Empirical Result Number of concerns with various Improvements: The improvements are positive for most of the concerns
Conclusion • We propose a multi-abstraction concern localization framework • We also propose a multi-abstraction vector space model • Our experiments on 285 AspectJ bugs show that MAP improvement is up to 19.36%
Future work • Extend experiments by investigating: • Different numbers of topics in each level of the hierarchy • Different hierarchy heights • Different topic models
Future work • Analyze the effects of document lengths: • For different number of topics • For different hierarchy heights • Experiment with Panichella et al. ‘s method [1] to infer good LDA configurations for our approach • [1] A. Panichella, B. Dit, R.Oliveto, M.D. Penta, D. Poshyvanyk, and A.D Lucia. How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. (ICSE 2013)
Thank you! Questions? Comments? Advice? {btdle.2012, shaoweiwang.201, davidlo}@smu.edu.sg