140 likes | 152 Views
Explore the need for a natural-language-based system that extracts information from documents for automating software testing. Learn about SIFT (Specification Information From Text) and its unique features, alternative extraction methods, system design, and experimental results.
E N D
Information Extraction from Documents for Automating Softwre Testingby Patricia Lutsky Presented by Ramiro Lopez
Outline • Why is there a need for a natural-language-based system for extracting information from documents • Alternative ways for extracting information from documents • System design and implementation details • Experimental Results
Motivation for SIFT • What is SIFT? SIFT stands for Specification Information From Text. • Various documents in Software Engineering are written in natural language. • Examples: Requirements and Specification Documents, User Manuals. • Software Engineering Documents tend to be written in a very particular way with specific sections and subsections, i.e., they are semi-structured.
What does SIFT do? • SIFT is essentially an automated testing tool • It extracts specification-level information, generates tests with that information and adds them to the set of existing test cases • The tests are then run to check that the system conforms to the documentation
Alternative ways for extracting information from documents • Use a controlled language for requirements specifications • Parse natural language texts about testing entirely and generate test scripts • Extract specific facts on system specifications, but no specific testable facts
What is unique about SIFT? • Extracts specific testable facts from semi-structured documents • Uses XML, which separates content information from presentation formats, to give the document a consistent structure • Does not pursue full-text understanding, thus avoiding issues related to the endless ways of saying the same thing
How to use SIFT • Identify concepts that can be extracted for testing • Examine a document to find out how it is organized and to find the different sentence types • Encode sentence types in a grammar • Create XML tags to give the document a consistent structure
Example of how a sentence is processed • Natural-language specification: The maximum value you can specify with the BUFQUO argument is 65355 • The parser translates this to a canonical form: The maximum value for BUFQUO is 65355 and a canonical form (maximum_value BUFQUO 65355) • Maximum_value BUFQUO 65355 is then mechanically converted into actual code, a test case, and added to the system
Example of a rule in a grammar • Suppose you have two structurally equivalent sentences: The box is on the counter. The glass is under the counter. • They would be translated into a rule in a grammar as follows: NounPhrase is Preposition NounPhrase
When can SIFT be used • Use on long-term projects where documentation will go through many versions • Use on semi-structured documents that are organized in a predefined way • Use on documents written in a consistent style • Use on domains that have many similar semantic entities (example: methods that have arguments)
Experimental Results • SIFT was used to extract information from an operating system’s reference manual • The total number of tests identified by the developers was 174 • SIFT was able to find 25 or 14% of the 174
Final thoughts • It is only a proof-of-concept testing tool, but it has potential to save developers time on trivial test cases • I think the natural-language approach is error-prone and costly because people may not follow a consistent writing style • Deciding on a standard template that limits the choices of structure in a document might be more useful, since people will be forced to follow the standard and it is less likely that tests will be missed because of an inconsistent writing style