1 / 22

Reverse Engineering State Machines by Interactive Grammar Inference

Reverse Engineering State Machines by Interactive Grammar Inference. Neil Walkinshaw , Kirill Bogdanov , Mike Holcombe, Sarah Salahuddin. State Machines. Used to model software behaviour. edit. Documentation. load. Inspection / review. save as. close. Model-based testing. ok. exit.

lael-malone
Download Presentation

Reverse Engineering State Machines by Interactive Grammar Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, KirillBogdanov, Mike Holcombe, Sarah Salahuddin

  2. State Machines • Used to model software behaviour edit Documentation load Inspection / review save as close Model-based testing ok exit Model checking

  3. State Machines • Used to model software behaviour edit Documentation load Inspection / review save as close Model-based testing ok exit Model checking • Only useful if complete and up-to-date • Usually not the case due to time constraints and software evolution

  4. Reverse Engineering State Machines • Static analysis – analysis of source code • symbolic execution, flow analyses, ... • Inevitably considers executions that are infeasible in practice • Dynamic analysis – infer model from sample executions • Favoured for accuracy • States considered equal if subsequent trace is similar • Variants of the k-tails algorithm [Biermann, Feldman-1972] most common reverse engineering algorithm

  5. Traditional Approach • For any point in a trace, its k-tail is the following sequence of k events or functions • Point x is considered equivalent to y if the k-tails are equal <load,edit,edit,edit,save_as,ok,edit,edit> load edit ok edit edit edit edit save_as

  6. Traditional Approach • For any point in a trace, its k-tail is the following sequence of k events or functions • Point x is considered equivalent to y if the k-tails are equal <load,edit,edit,edit,save_as,ok,edit,edit> K=2 load edit ok edit edit edit edit save_as

  7. Traditional Approach • For any point in a trace, its k-tail is the following sequence of k events or functions • Point x is considered equivalent to y if the k-tails are equal <load,edit,edit,edit,save_as,ok,edit,edit> K=2 ok load edit ok edit edit edit edit save_as load edit save_as edit

  8. Traditional Approach • For any point in a trace, its k-tail is the following sequence of k events or functions • Point x is considered equivalent to y if the k-tails are equal <load,edit,edit,edit,save_as,ok,edit,edit> K=2 ok load edit ok edit edit edit edit save_as ok Remove Non determinism load edit save_as load save_as edit edit

  9. Problems • Too expensive if result is to be correct and complete: • Need complete set of executions up to certain length • Passive – all executions need to be presented at once • If provided traces only partial (probable for non-trivial system) the resulting model is untrustworthy • Difficult to tell how complete the model is – what’s missing? edit load save as close ok exit ok load save_as edit

  10. Regular Grammar Inference • Given a set of valid and (optionally) invalid sentences from a language, infer its grammar. • Regular grammars can be represented as deterministic finite state machines • Problem of regular grammar inference equivalent to that of reverse engineering state machines • Several sophisticated grammar inference techniques • Effectively address many problems that arise with current reverse-engineering approaches

  11. Benefits of Adapting Grammar Inference Techniques • Active techniques • Do not require set of executions to be presented at once • Interact with an oracle to identify missing information • More efficient • Can efficiently process large sample sets. • Reasonably accurate given sparse sets of executions • More sophisticated heuristics to accurately identify equivalent states

  12. Query-Driven State Merging (QSM) • Devised by Dupontet al. • Combines benefits mentioned on previous slide • Active, efficient, reasonably accurate for sparse sets of sample executions • Guaranteed to produce correct machine if set of sample executions is characteristic: • Must cover every transition in the target grammar • Enough positive and negative samples to differentiate between different states (to prevent false merges) • Questions aim to elicit characteristic sample from oracle

  13. Query-Driven State Merging (QSM) <load, close, exit> <load, edit, edit, save_as, ok, close, exit> <load, edit, edit, edit, close, exit> Generate “Prefix Tree Acceptor” exit close load edit save_as close edit ok exit edit exit close

  14. Query-Driven State Merging (QSM) Attempt merge Produce questions (executions valid in this machine, but not in unmerged version) <close,exit>? <edit,edit...>? <Load,load,close,exit>? exit close load edit save_as close edit ok exit edit exit close

  15. Query-Driven State Merging (QSM) Attempt merge Produce questions (executions valid in this machine, but not in unmerged version) If all questions answered yes, merge nodes Else add negative questions to graph Active Efficient Accepts negative information about model exit close load edit save_as close edit ok exit close, edit edit exit close

  16. Implementation • Use Eclipse TPTP to record traces • Sequence of method calls → <load,edit...> • Questions can either be answered manually • OR as tests directly to the system • Can vary number of questions generated • QSM component accepts simple text files of strings (prefixed with “+” and “-”)

  17. Evaluation • Used traces to generate JHotDraw case study • Described in paper • Generated random state machines • Subject to certain constraints – minimal, deterministic etc. • Three sets of 10 random machines (5, 25, 50 states) • Random paths over these machines = initial set of traces • Measured accuracy of final machine, and number of questions required

  18. Current and Future Work • Identify data constraints associated with states • Can use tools such as Daikon • Automatically answer queries • Static analysis – using call graph analysis to automatically propose negative / impossible executions • Automated test generation • Heuristics – can certain questions be safely ignored?

  19. Conclusions • Preliminary results show technique is reasonably accurate and efficient • Can potentially be almost entirely automated • Automatically generates tests (questions), many of which can be eliminated by static analysis anyway • Grammar Inference is useful source of ideas for dynamic analysis and reverse engineering

More Related