130 likes | 145 Views
A system that operates on free text documentation to acquire system models from requirements text, reducing manual processing and improving system architecture quality and accuracy.
E N D
SMART System Model Acquisition from Requirements Text Technion – Israel Institute of Technology
System Model Acquisition from Requirements Text • Operates on free text documentation, such as business process specifications or user requirements • Results depend critically on the quality of the processed documentation • Based on Object-Process Methodology (OPM) that has two semantically equivalent modalities: • Textual – Object-Process Language (OPL) • Graphic – Object-Process Diagram (OPD) Technion – Israel Institute of Technology
System Model Acquisition from Requirements Text • Significantly reduces the quantity of material that needs to be processed manually • Reduces the initial level of conceptual complexity • Graphic manipulation (OPD) much easier than text editing • Quality, accuracy, and conciseness of the system architecture – higher due to the discipline OPM introduces • Capable of automatic generation of UML diagrams Technion – Israel Institute of Technology
SMART - System Diagram SMART OPCAT Categorization Engine OPL Generator System Model Acquisition System Requirements Unstructured Text System Architecting Team System Model Technion – Israel Institute of Technology
System ModelAcquisition System Model Acquisition In-zoomed SMART System Requirements Unstructured Text Category Extraction Categorization Engine Category List raw edited System Architecting Team List Editing Relation Formulating Relation Set OPL Generator OPL Sentence Generating OPL Sentence Set OPCAT OPD Constructing System Model Technion – Israel Institute of Technology
SMART – Procedural Steps • Automatic Extraction of Categories from Unstructured Text • Manual Editing of Categories • Automatic Search of OPM Relations • Automatic Generation of OPL Sentences • Manual Editing of the Results Technion – Israel Institute of Technology
Automatic Extraction of Categories from Unstructured Text • Categorization engine in Common LISP • Categories = idiomatic phrases (word sequence) reflecting the underlying topics in a given corpus of documents • Based on heuristics • Could combine external ontologies/taxonomies/thesauri Technion – Israel Institute of Technology
Manual Editing of Categories • Selection of categories that can serve as things in the OPM model, and classifying them as either object or processes • Clustering of alternative formulations for the selected OPM things based on their semantic similarity • Optionally adding OPM things that did not show up among the extracted categories Technion – Israel Institute of Technology
Automatic Search of OPM Relations • Utilizes a set of configurable, predefined templates: • Template consists of two things and the relation between them, expressed in alternative ways • Utilizes second order regular expressions defined on any lexical or grammatical attribute (part‑of‑speech, capitalization, punctuation) • Finite‑state automaton that operates on suffix‑tree index consisting of tokens • Instead of comparing character strings compares word sequences Technion – Israel Institute of Technology
Automatic Generation of OPL Sentences • Every extracted natural language sentence straight‑forwardly translated into OPL • Reformulation of outcome to better reflect the underlying relations: • Custom relations transformed into processes (cached into=> Caching) • Complex relations transformed into two equivalent simple sentences (Actual Documents Cached into Document Repositories => (1) Caching requires Actual Documents, (2) Caching yields Document Repositories) • Transformations do not modify the underlying semantics of the NL sentences Technion – Israel Institute of Technology
Manual Editing of the Results • Non-semantic corrections– extraction did not depict all of the existing or implied relations • Additions and eliminations - semantically modify original output • Scaling applied to simplify results without losing details Technion – Israel Institute of Technology
Benefits • Significant cut-down in time and resources • Minimizes efforts • Focus on the system overview ("big picture“) • High-quality results • Minimizes time-to-market Technion – Israel Institute of Technology
Future Research Directions • Tested on EEC IST-2001-38100 GRACE (Grid Retrieval and Categorization Engine) • To be utilized for system design in EEC IST-202-507126 COCOON (Building Knowledge-driven and Dynamically Networked Communities within European Healthcare Systems) • Looking for commercial pilot application Technion – Israel Institute of Technology