540 likes | 723 Views
Symbolic Finite State Transducers: Algorithms and Applications. Margus Veanes Pieter Hooimeijer Benjamin Livshits David Molnar Nikolaj Bjørner. Symbolic Finite State Transducers: Algorithms and Applications. Margus Veanes Pieter Hooimeijer Benjamin Livshits David Molnar
E N D
Symbolic Finite State Transducers:Algorithms and Applications MargusVeanes Pieter Hooimeijer Benjamin Livshits David Molnar NikolajBjørner
Symbolic Finite State Transducers:Algorithms and Applications MargusVeanes Pieter Hooimeijer Benjamin Livshits David Molnar NikolajBjørner
Formal languages are well-studied.
a*b+ a b b ✔ ✘ aaaa abb
POPL (2001–2011) Number of papers “automata”
What about transformation?
Compute image: • Check properties: • Equivalence • Composition ✔ abb{baa} ✘ aaaa b/a a/b b/a
POPL (2001–2011) Number of papers “automata” “transducers”
Talk Outline Background Approach Case Studies
Background “Fast and Precise Sanitizer Analysis with Bek” Idea: Develop a language for commonly-used string transformations. Prove properties about those transfor-mations.
t := iter(cins)[b:= false;] {case (!b&&cin"['\"\\]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); }; Gap b/a a/b b/a FSTs Code
domain-specific languages t := iter(cins)[b:= false;] {case (!b&&cin"['\"\\]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); }; 1 b/a a/b b/a FSTs Code
domain-specific languages t := iter(cins)[b:= false;] {case (!b&&cin"['\"\\]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); }; 1 2 b/a a/b more expressivetransducers b/a FSTs Code
domain-specific languages t := iter(cins)[b:= false;] {case (!b&&cin"['\"\\]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); }; 1 2 b/a a/b more expressivetransducers b/a FSTs Code
Talk Outline Background Approach Case Studies
Symbolic Finite State Transducers Idea: • Equip transitions with formulae • Allow the use of any decidable theory
Definition Symbolic Finite State Transducer (SFT):
Symbolic Finite State Transducer (SFT): - states - start state - final states
Symbolic Finite State Transducer (SFT): - states - start state - final states
predicates output Symbolic Finite State Transducer (SFT): - states - start state - final states
Symbolic Finite State Transducer (SFT): • Background Theory: • predicates • label theory - states - start state - final states - transition
Closure under composition SFT A B SFT A SFT B in in out out Requirement:
Single-valued equivalence Definition: 1
Algorithm: • Construct 2-outputproduct transducer • Find conflicts (dft): • output length • output value Complexity: complexity of decision procedure number of rules
Key restriction: single-valuedness • Transducer A is single-valued if, for all inputs, A has at most one out-put. 1
Transducer A is single-valued if, for all inputs, A has at most one out-put. 1 Note: This definition permits non-determinism, e.g.: b/[] ... ... ... b/[]
idempotence subsumption equivalence commutativity ...
Talk Outline Background Approach Case Studies
Case Studies "b"'b' Location Privacy HTMLdecode MalwareFingerprinting Image Blurring
"b"'b' Location Privacy HTMLdecode MalwareFingerprinting Image Blurring
HTMLdecode "<" "<" "<" "<" Decode
The Task: Prove that HTMLdecodeis not idempotent The Metric: Running time "<" "<" "<" "<" Decode
"<" "<" "<" "<" Decode The Problem: Unicode defines 1,114,112 code points.
Three Participating Representations C# C# C# +REG +REG SFT (Eager) SFT+Registers (Eager) SFT+Registers (Lazy)
Transducer size () 6.6M maximum number of digits
C# C# C# +REG +REG SFT (Eager) SFT+Registers (Eager) SFT+Registers (Lazy)
Transducer size () 6.6M SFT SFT + Symbolic State Space 51 maximum number of digits
Idempotence Checking: Time SFT + REG(eager) SFT + REG(lazy) SFT 2 3 4 5 6 maximum number of digits
Talk Outline Background Approach Case Studies
Conclusion • Introduced Symbolic Finite State Transducersover any decidable background theory • Presented decidability and complexity results • Comes with a scalable and robust* implementation