220 likes | 363 Views
Reverse Engineering State Machines by Interactive Grammar Inference. Neil Walkinshaw , Kirill Bogdanov , Mike Holcombe, Sarah Salahuddin. State Machines. Used to model software behaviour. edit. Documentation. load. Inspection / review. save as. close. Model-based testing. ok. exit.
E N D
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, KirillBogdanov, Mike Holcombe, Sarah Salahuddin
State Machines • Used to model software behaviour edit Documentation load Inspection / review save as close Model-based testing ok exit Model checking
State Machines • Used to model software behaviour edit Documentation load Inspection / review save as close Model-based testing ok exit Model checking • Only useful if complete and up-to-date • Usually not the case due to time constraints and software evolution
Reverse Engineering State Machines • Static analysis – analysis of source code • symbolic execution, flow analyses, ... • Inevitably considers executions that are infeasible in practice • Dynamic analysis – infer model from sample executions • Favoured for accuracy • States considered equal if subsequent trace is similar • Variants of the k-tails algorithm [Biermann, Feldman-1972] most common reverse engineering algorithm
Traditional Approach • For any point in a trace, its k-tail is the following sequence of k events or functions • Point x is considered equivalent to y if the k-tails are equal <load,edit,edit,edit,save_as,ok,edit,edit> load edit ok edit edit edit edit save_as
Traditional Approach • For any point in a trace, its k-tail is the following sequence of k events or functions • Point x is considered equivalent to y if the k-tails are equal <load,edit,edit,edit,save_as,ok,edit,edit> K=2 load edit ok edit edit edit edit save_as
Traditional Approach • For any point in a trace, its k-tail is the following sequence of k events or functions • Point x is considered equivalent to y if the k-tails are equal <load,edit,edit,edit,save_as,ok,edit,edit> K=2 ok load edit ok edit edit edit edit save_as load edit save_as edit
Traditional Approach • For any point in a trace, its k-tail is the following sequence of k events or functions • Point x is considered equivalent to y if the k-tails are equal <load,edit,edit,edit,save_as,ok,edit,edit> K=2 ok load edit ok edit edit edit edit save_as ok Remove Non determinism load edit save_as load save_as edit edit
Problems • Too expensive if result is to be correct and complete: • Need complete set of executions up to certain length • Passive – all executions need to be presented at once • If provided traces only partial (probable for non-trivial system) the resulting model is untrustworthy • Difficult to tell how complete the model is – what’s missing? edit load save as close ok exit ok load save_as edit
Regular Grammar Inference • Given a set of valid and (optionally) invalid sentences from a language, infer its grammar. • Regular grammars can be represented as deterministic finite state machines • Problem of regular grammar inference equivalent to that of reverse engineering state machines • Several sophisticated grammar inference techniques • Effectively address many problems that arise with current reverse-engineering approaches
Benefits of Adapting Grammar Inference Techniques • Active techniques • Do not require set of executions to be presented at once • Interact with an oracle to identify missing information • More efficient • Can efficiently process large sample sets. • Reasonably accurate given sparse sets of executions • More sophisticated heuristics to accurately identify equivalent states
Query-Driven State Merging (QSM) • Devised by Dupontet al. • Combines benefits mentioned on previous slide • Active, efficient, reasonably accurate for sparse sets of sample executions • Guaranteed to produce correct machine if set of sample executions is characteristic: • Must cover every transition in the target grammar • Enough positive and negative samples to differentiate between different states (to prevent false merges) • Questions aim to elicit characteristic sample from oracle
Query-Driven State Merging (QSM) <load, close, exit> <load, edit, edit, save_as, ok, close, exit> <load, edit, edit, edit, close, exit> Generate “Prefix Tree Acceptor” exit close load edit save_as close edit ok exit edit exit close
Query-Driven State Merging (QSM) Attempt merge Produce questions (executions valid in this machine, but not in unmerged version) <close,exit>? <edit,edit...>? <Load,load,close,exit>? exit close load edit save_as close edit ok exit edit exit close
Query-Driven State Merging (QSM) Attempt merge Produce questions (executions valid in this machine, but not in unmerged version) If all questions answered yes, merge nodes Else add negative questions to graph Active Efficient Accepts negative information about model exit close load edit save_as close edit ok exit close, edit edit exit close
Implementation • Use Eclipse TPTP to record traces • Sequence of method calls → <load,edit...> • Questions can either be answered manually • OR as tests directly to the system • Can vary number of questions generated • QSM component accepts simple text files of strings (prefixed with “+” and “-”)
Evaluation • Used traces to generate JHotDraw case study • Described in paper • Generated random state machines • Subject to certain constraints – minimal, deterministic etc. • Three sets of 10 random machines (5, 25, 50 states) • Random paths over these machines = initial set of traces • Measured accuracy of final machine, and number of questions required
Current and Future Work • Identify data constraints associated with states • Can use tools such as Daikon • Automatically answer queries • Static analysis – using call graph analysis to automatically propose negative / impossible executions • Automated test generation • Heuristics – can certain questions be safely ignored?
Conclusions • Preliminary results show technique is reasonably accurate and efficient • Can potentially be almost entirely automated • Automatically generates tests (questions), many of which can be eliminated by static analysis anyway • Grammar Inference is useful source of ideas for dynamic analysis and reverse engineering