1 / 22

Presented by Paul Heintzelman and Bob Mazzi 12/2/08

Explore methodologies in recovering traceability links between code and documentation, evaluating IR models, tool support, and case studies. Understand the benefits and challenges in the process.

smorehead
Download Presentation

Presented by Paul Heintzelman and Bob Mazzi 12/2/08

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recovering Traceability Links betweenCode and DocumentationBy Giuliano Antoniol, Member, IEEE, Gerardo Canfora, Member, IEEE, Gerardo Casazza, Member, IEEE, Andrea De Lucia, Member, IEEE, and Ettore Merlo, Member, IEEE. Presented by Paul Heintzelman and Bob Mazzi 12/2/08

  2. Introduction • Why do we document? • Program comprehension • Maintenance • Requirements tracing • Impact Analysis • Reuse of existing code • Kinds of Documentation • Design documents • Program manuals • Maintenance logs • Comments • What we learn from documentation • Domain knowledge • Laws and regulations

  3. Traceability Link Recovery Method - Process • Code path • Parse for list of Identifiers • Parse compound Identifiers • Text normalization • Document Path • Convert to lower case • Remove stop words • Convert plurals to singulars • Classification • Build index of relationships

  4. Tool Support • Built toolkit to support proposed method • Comments in code were disregarded • Used public domain utilities with custom tools • Built to work in both English and Italian

  5. Probabilistic IR Model • Model computes ranking score • Probability that a document is related to a specific piece of code • Model is judged vs. a Bayesian network • Similarity(Di, Q) = P(Di|Q) • Using baye’s rule • P(Di|Q) = P(Q|Di)P(Di)/P(Q) • P(Q) is constant • P(Di) assume is constant • Evaluate on P(Q|Di) • P(Q|Di) = P(W1, W2, … Wm| Di)

  6. Smoothing • Use a unigram model • P(Q|Di) = P(W1, W2, … Wm| Di) • Very large number of word combinations • Each combination is rare • Works poorly on sparse data • P(W1, W2, … Wm| Di) approximately • P(W1|Di)*P(W2|Di)…P(Wm|Di) • Less accurate • Requires less data

  7. Vector Space IR Model • Maps vectors of each document • idf(j) = |Documents|/ |Documents with jth term| • Vector element • d(ij) = tf(ij) * log(idf(j)) • Maps vectors of query terms • Similar to above • Compares vectors and returns closest matches • Uses cosine distance • k

  8. Case Study Design • First Study used Leda library • Freely available C++ Library • 95 KLOC • 208 classes • 88 pages of documentation • Second study used an application for Hotel management • 20 KLOC • 95 classes • 16 functional requirements • System documentation

  9. Case Study - LEDA • Details of LEDA • Freely available C++ Library • 95 KLOC • 208 classes / 98 documented • 88 pages of documentation • Traceability easy due to implementation • Used simplified identifier rules • Validation • Manually built 208x88 traceability matrix to compare to tool results • Results • Poor precision due to 110 classes with no manual corresponding entries • Vector model reaches 100% Recall in 12 cuts vs. 17 needed for probabilistic model • Probabilistic model has consistently higher Precision

  10. Case Study - Albergate • Details of Albergate • Java Application • 20 KLOC • Relational Database • 95 classes • 16 functional requirements • System documentation • Traceability more difficult due to implementation • Documentation less refined • Not a released, in production system • Application written in Italian with complex grammar • Validation • Manually built 16x60 traceability matrix to compare to tool results • Results ( Simplified Process as LEDA ) • Low precision and recall with Probabilistic model • Low precision, but significantly better recall with Vector model, • Results ( Full Implementation ) • Vector model reaches 100% Recall in 7 cuts vs. 6 needed for probabilistic model • Models have consistent Precision

  11. Case Study Analysis • Comparison of Probabilistic vs. Vector models • Both models suitable for traceability link recovery • Probabilistic model achieves high recall sooner • Vector model starts with lower recall but improves quickly as cut size increases • Probabilistic model best suited for • Code that has less well defined Identifier naming • Projects where larger investment in identifier parsing is justified • Vector model best suited for • Code with consistent use of identifiers between documents and code • Projects where simplified identifier parsing is desired

  12. Evaluation • Performance of Information Retrieval ( IR ) models with respect to standard tools • Effectiveness of IR models in helping a software engineer • Effort savings with respect to the granularity of the document space • Effectiveness of a fixed cut versus a variable cut The methods presented in this paper have been evaluated using the following criteria

  13. Evaluation - IR Compared to GREP • GREP searches can be done at least two ways • Search for a specific Identifier ( A ) • Search for two or more Identifiers ( A or B ) • Issues • Many Identifiers may not exist in documentation • No ranking for relevance • More items returned of limited relevance

  14. Evaluation - Benefits of IR in a Traceability Recovery Process • Experiment to evaluate manual traceability vs. automated process using developed tools • Two groups of 4 students ( 3 undergrad + 1 grad and 3 grad + 1 undergrad ) were assigned the task of reconstructing a traceability matrix for Albergate manually • In addition to requirements documents, group A was given the ranked list obtained from Probabilistic tests • Results of this experiment show that the group using the ranked list discussed in this paper produced better results than other approaches tested

  15. Evaluation - Considerations on Effort Saving and Document Granularity • Previous sections have been evaluated using Recall and Precision • These metrics do not measure “Effort” ( the amount of time a developer uses to achieve a given result • We propose a new metric Recovery Effort Index ( REI ) • Retrieved = number of documents retrieved • Available = total number of documents available • Savings can be assumed to be the portion of available documents that do NOT have to be searched for relevance • Proving this assumption is an area of future work

  16. Evaluation - Retrieving a Variable Number of Documents • Other approaches used fixed number of documents for a query • The question “Is a fixed # the best approach?” was asked • Two approaches were explored • Pure variable # returned based on calculated relevance • A mix of a fixed size and variable # • There was an improvement in effort measured by REI for a reduced number of returned documents

  17. Requirement Tracking J. Konclin and M. Bergen, 1988 B. Ramesh and V. Dhar, 1992 F.A.C. Pinhero and J.A.Goguen, 1996 Program Comprehension B. Shneiderman and R. Mayer, 1979 R. Brooks, 1983 I. Vessey, 1985 S. Letovsky, 1986 N. Pennington, 1987 N. Pennington, 1987 D.L. Potter, J. Sinclair, D. Till, 1991 A.V. Mayrhauser and A. Vans, 1993 E. Soloway and K. Ehrlich, 1994 A.V. Mayrhauser and A. Vans, 1994 A.V. Mayrhauser and A.M. Vans, 1996 Reuse of existing software S.P. Arnold and S.L. Stepowey, 1987 W.B. Frakes and B.A. Nejmeh, 1987 B.A. Burton, et al., 1987 G. Caldiera and V.R. Basili, 1991 G. Canfora, et al., 1994 M. Pighin, 2001 Reasons for Rebuilding Traceability Maintenance T. Biggerstaff, 1989 C. Rich and R. Waters, 1990 G. Caldiera and V.R. Basili, 1991 S. Rugaber and R. Clayton, 1993 T. Biggerstaff, et al., 1993 ] G.C. Murphy, et al., 1995 Impact Analysis R.S. Arnold and S.A. Bohner, 1993 R.J. Turver and M. Munro, 1994 M.J. Fyson and C. Boldyreff, 1998 G. Antoniol, et al., 2000 This Paper Related Work - SOA

  18. Traceability W.B. Frakes and R. Baeza-Yates,1992 J. Weidl and H. Gall, 1998 G. Antoniol, et al., 1999. G. Antoniol, et al., 2000 G. Antoniol, et al., 2000 G. Antoniol, et al., 2000 G. Antoniol, et al., 2000 Reasons for Rebuilding Traceability From prior slide Probabilistic Analysis T.H. Cormen, et al., 1990 I.H. Witten and T.C. Bell, 1991 ] L. Bain and M. Engelhardt, 1992. T.M. Cover and J.A. Thomas,1992 R. Rosenfeld, 1994 G.C. Murphy, et al., 1995 D. Gusfield, 1997 R. DeMori, 1998 This Paper Information Retrieval Y. Maarek, D. Berry, and G. Kaiser, 1991 W.B. Frakes and R. Baeza-Yates, 1992. W.B. Frakes and R. Baeza-Yates, 1992 ] D. Harman, 1992. Parsing and Semantic Processing G. Adamson and G. Boreham, 1974 R.C. Angell, et al., 1983 G. Salton and C. Buckley, 1988. H. Ney and U. Essen, 1991 R. Kuhn and R.D. Mori, 1993 Vector Analysis D. Harman, 1992 Related Work - SOA

  19. Conclusions • IR has been presented and evaluated as a tool to reduce effort in recovering traceability links • IR does offer promise in reducing effort • Different IR models may offer advantages in different situations • IR is sensitive to less than ideal text normalization

  20. Limitations • Both vector and probabilistic models should be evaluated on larger systems to confirm validity and scalability • Text normalization may be limited by developers knowledge of the domain. • Auto and automobile may map correctly while car may not • Effort saving metrics should be explored • It is assumed that if 50% of the code can be set aside for a particular task that the effort drops by 50%. This is unlikely in real world situations • Ignoring comments in code may be underutilizing a valuable resource • Many programmers and shops practice commenting in order to self document code • Verbiage in comments may use the key word being searched for while the code itself may use different or somewhat arbitrary identifiers • Text normalization tools will work with different degrees of effectiveness in different languages • In some situations the effort to develop an answer through IR may exceed the effort to perform the change without IR

  21. Comments – Future Work • Experimentation to determine optimal cut sizes should be performed • This may involve either fixed or variable cut sizes • Improvements in smoothing techniques may improve results of IR • IR may develop into a worthwhile tool to support Impact Analysis • Developing a set of rules as to when IR should and should not be applied may have value

  22. Discussion • Lack of Focus and Clarity • Why are group A and B Different? • Was Group B Allowed to use Grep? • They never actually compare grep to the IR methods • A number of unjustified assumptions • Used the format of the documentation to select text normalization

More Related