360 likes | 373 Views
This paper discusses the use of information retrieval techniques for requirements tracing in IV&V, with a focus on building a special-purpose requirements tracing tool called RETRO. The paper also presents experimental results and potential applications of the tool.
E N D
Robust Requirements Tracing Via Internet Tech:Improving an IV&V TechniqueSAS 2004 July 20, 2004 Alex Dekhtyar Jane Hayes Senthil Sundaram Ganapathy Chidambaram Sarah Howard Department of Computer Science University of Kentucky
Outline • Requirements Tracing and Information Retrieval • Methods • Metrics • RETRO • Experimental Results • NASA Research information • Technology Readiness Level • Potential applications • Ease of finding, or availability of, data or case studies • Barriers to research or application • Future work
Who Is Who • Sponsor • NASA IV&V Center, Fairmont, WV • Principal Investigators: Alexander Dekhtyar • Jane Hayes • Ph. D. Student: Senthil Karthekian Sundaram* • M.S. Student: Sarah Howard • Past Undergraduate Students: James Osborne* • Rijo Jose Thozhal • Subcontractor: SAIC • * Supported by the NASA grant
The Problem How can we automate tracing requirments during IV&V? Relevance to NASA • Alleviate work of NASA IV&V analysts • Improve quality of IV&V for NASA software Importance/Benefits Improve analyst productivity on one of the most time-consuming IV&V tasks
Approach • Use Information Retrieval Techniques for Requirements Tracing • Build RETRO (REquirements TRacing On-target) • Evaluate performance • TF-IDF, Thesaurus, Probabilistic IR, LSI • Analyst Feedback • Metrics • Special-purpose requirments tracing tool • Standalone version • Integrated with SAIC’s SuperTracePlus • MODIS, LOFAR, CM-1 datasets
Matching algorithm Analyst representation Feedback 1. 2. 3. Approach: IR for Requirements Tracing 2 1 3 Design Document Yes Yes No Requirements Document
Outline • Requirements Tracing and Information Retrieval • Methods • Metrics • RETRO • Experimental Results • NASA Research information • Technology Readiness Level • Potential applications • Ease of finding, or availability of, data or case studies • Barriers to research or application • Future work
Methods • TF - IDF • TF = Term Frequency • IDF = Inverse Document Frequency (rare terms) • Latent Semantic Indexing (LSI) • term x document => “factor” x document • #”factors” << # terms • Enhancements: • Thesaurus • Feedback Processing • Filtering
Outline • Requirements Tracing and Information Retrieval • Methods • Metrics • RETRO • Experimental Results • NASA Research information • Technology Readiness Level • Potential applications • Ease of finding, or availability of, data or case studies • Barriers to research or application • Future work
N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Metrics
N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Metrics Hits Hits Precision = Recall = Hits + Strikes Hits +Misses
N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Metrics Hits Hits Precision = Recall = Hits + Strikes Hits +Misses Hits + Strikes Selectivity = M * N
N - number of low-level requirements; M - number of high-level requirements; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Metrics Hits Hits Precision = Recall = Hits + Strikes Hits +Misses Hits + Strikes Selectivity = M * N AvgH = average relevance of Hits AvgS = average relevance of Strikes DiffR = AvgH - AvgS
N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Metrics Hits Hits Precision = Recall = Hits + Strikes Hits +Misses Hits + Strikes Selectivity = M * N AvgH = average relevance of Hits AvgS = average relevance of Strikes DiffR = AvgH - AvgS Lag(Hit) = # Strikes for high-level requirement with Higher relevance Lag = average Lag(Hit) over all Hits
N - number of low-level requirements; M - number of high-level requirments; Hits - number of correct candidate links Strikes - number of false positives Misses - number of missed links Metrics Hits Hits Precision = Recall = Hits + Strikes Hits +Misses Hits + Strikes Selectivity = M * N Breakpoint =(threshold, Precision, Recall), s.t. Precision = Recall
Metrics • Precision: signal - to - noise • Recall: “coverage” • Selectivity: improvement in # of comparisons vs. exhaustive search • AvgH, AvgS, DiffR, Lag - separation between Hits and Strikes in candidate link lists • Breakpoints - effects of filtering
Outline • Requirements Tracing and Information Retrieval • Methods • Metrics • RETRO • Experimental Results • NASA Research information • Technology Readiness Level • Potential applications • Ease of finding, or availability of, data or case studies • Barriers to research or application • Future work
Filter RETRO Architecture documents Analyst Feedback processor Build Representation IR toolbox
A SFEP RETRO Build STP Interactive Link Anlysis RETRO IR Toolbox RETRO Feedback STP Report Generation RETRO + SuperTracePlus requirements documents Analyst Review Representation Traceability Reports STP RETRO
Outline • Requirements Tracing and Information Retrieval • Methods • Metrics • RETRO • Experimental Results • NASA Research information • Technology Readiness Level • Potential applications • Ease of finding, or availability of, data or case studies • Barriers to research or application • Future work
The Universe of Tests method thesaurus feedback threshold Top 1 Top 2 Top 3 Top 4 Yes No TF-IDF LSI* [0.0…0.5] X X X • * LSI: number of dimensions + • low-level documents • high+low-level documents • high-level, low-level documents separately
Datasets • MODIS • 20 high-level • 49 low-level • 41 true links • CM-1 • ~200 high-level • ~300 low-level • # true links - under construction
Breakpoint MODIS, TF-IDF, Thesaurus Top2 FeedbackFiltering at Iteration 0
Above 70% MODIS, TF-IDF, Thesaurus Top2 Feedback
MODIS, Comparing Feedback Traces Above 70%
Outline • Requirements Tracing and Information Retrieval • Methods • Metrics • RETRO • Experimental Results • NASA Research information • Technology Readiness Level • Potential applications • Ease of finding, or availability of, data or case studies • Barriers to research or application • Future work
NASA Research Information • Technology Readiness Level -- 6.5 for RETRO • Integrated with existing software system • Engineering feasibility demonstrated • Limited documentation available • Most functionality available for demonstration and test • Most software bugs removed • Potential applications • Tracing bug reports to code • Identifying related/duplicate bug reports • Ease of finding, or availability of, data or case studies • Data available • Issue is answerset • Barriers to research or application • Answerset availability • IV&V analysts for human factors studies • Publications • Paper accepted to RE 2004 • 1 journal paper submitted, one in progress
Outline • Requirements Tracing and Information Retrieval • Methods • Metrics • RETRO • Experimental Results • NASA Research information • Technology Readiness Level • Potential applications • Ease of finding, or availability of, data or case studies • Barriers to research or application • Future work
Next Steps, Conclusions, Plans, Ideas • IR methods work : need to implement more • Productize RETRO (Check!) • Data Integration with existing tools (Check!) • Other IV&V problems may be alleviated • Study “human factors”