260 likes | 278 Views
This paper discusses the use of Extended Regular Expressions (ERE) for efficient runtime verification of software systems. The authors propose a monitoring algorithm that processes events in O(2^2m^2) space and time. The paper also explores the challenges and lower bounds of the ERE monitoring problem.
E N D
Testing Extended Regular Language Membership Incrementally by Rewriting Grigore Rosu Mahesh Viswanathan University of Illinois at Urbana-Champaign, USA
Increasing Software Reliability • Current solutions • Human review of code and testing • Most used in practice • Usuallyad-hoc, intensive human support • (Advanced) Static analysis • Often scales up • False positives and negatives, annotations • (Traditional) Formal methods • Model checking and theorem proving • General, good confidence, do not always scale up
Runtime Verification and Monitoring Idea: Let system run and observe execution trace. If that violates or appears to violate requirements then report error or guide the program to avoid or to hit error.
Runtime Verification and Monitoring • PathExplorer– developed jointly with Havelund • Used on 70,000 lines of C++ code (K9 Rover) • Found a deadlock in ~10 seconds • Confirmed a datarace suspicion • Runtime Verification Workshop • ‘01 –France (CAV), ‘02 –Denmark (CAV), ’03 –USA (CAV) • ’04 –Spain (ETAPS), …
PathExplorer - Overview Observer Events Running program (socket) (Joint work with Klaus Havelund of NASA Ames)
Predictive Analisis Specification Based Monitoring PathExplorer – the Observer paxmodules module datarace =‘java pax.Datarace’; module deadlock =‘java pax.Deadlock’; module temporal =‘java pax.Temporal spec’; module ERE =‘java pax.Ere spec’; end warning … datarace deadlock warning … Dispatcher Event stream warning … temporal ERE warning …
Why (Extended) Regular Expressions? • Ordinary programmers and software engineers understand and use regular expressions • Perl, Python, etc. • Safety policies are often regular patterns on sequences of states/events: • (idle* open (read + write)* close)* • Complementation needed: to say what should not happen: ¬ (any* start1 (¬ end1)* start2 any*)
Extended Regular Expressions (ERE) • Regular expressions with complement • Language of an ERE • Intersection R ∩ R’:= ¬(¬R + ¬R’) R ::= Φ | ε | A | R + R | R · R | R* | ¬R L(Φ) = ΦL(R + R’) = L(R) L(R’) L(ε) = {ε} L(R ·R’) = {ww’ | w L(R), w’ L(R’)} L(A) = {A} L(R*) = (L(R))* L(¬R) = * \ L(R)
ERE Membership Problem • Given w * and R, is it the case that w L(R)? • Patterns in strings; many applications • Programming languages (PERL, Python) • Molecular biology (Knight-Myers95) • Monitoring • Efficient solutions are of great practical interest • From now on, n is the length of the word/trace w and m is the size of the ERE R • n is typically much much larger than m
What is known (I) • If R does not contain negations, then • Transform R into an NFA of size O(m) (Aho’90) • Solution in time O(nm) and space O(m) • Improved by Mayers’92 (JACM): time/space O(nm / log n) • Transform R into a DFA of size O(2m) (Aho’90) • Solution in time O(nm) and space O(2m) • Note: transitions in a DFA take logarithmic time • Negations and their nesting make the membership problem highly non-trivial
a a a a b b b b Problems with Negation (I) • How to complement an NFA? • Just complementing the set of final states is wrong! A A’ L(A) = {ab} L(A’) = {ab,a, ε}
k Problems with Negation (II) • DFAs can be complemented safely by just complementing the set of final states, but • NFA -> DFA implies exponential state blowup! • For k nested negations, 2^(2^(…(2^m)…)) states • This makes the membership problem non-elementary more complex in the context of (nested) negations
What is known (II) • Dynamic programming algorithm (Hopcroft-Ullman ’79) Time O(n3m) and space O(n2m) • Special synchronized alternating automata (Yamamoto ’02) – intersection but not negation (Kupferman-Zuhovitzky ’02) – general ERE Time O(n2m) and space O(nm+kn2), where k is the number of negations and intersections • Algorithms above store the word; this is unacceptable in many practical situations
Desired Behavior - Monitoring Algorithms processing and then discarding each event are desired in practice, since words or execution traces can be extremely long Observer Events Running program socket
Challenges and Talk Overview • What is the lower space/time bound of the ERE monitoring problem (to process one event)? • (2cm½) for space • What is a reasonable upper bound for the ERE monitoring problem (to process one event)? • Rewriting algorithm in O(22m2) space/time
Lower Bound for ERE Monitoring (I) • Consider the language (Chandra-Kozen-Stockmeyer81 in alternation) (Kupferman-Vardi98 in model checking) Lk ={u # w # u’ $ w|w{0,1}kand u,u’ {0,1,#}*} • We show that • There is an ERE Rk of size (k2) with L(Rk) = Lk • Any monitoring algorithm for Lk needs (2k) space • So we can conclude that the space lower bound for • ERE monitoring is (2cm½)
k ∩ [(0+1)i 0 (0+1)k-i-1 # (0+1+#)* $ (0+1)i 0 (0+1)k-i-1 + (0+1)i 1 (0+1)k-i-1 # (0+1+#)* $ (0+1)i 1 (0+1)k-i-1] i=0 Lower Bound for ERE Monitoring (II) Lk ={u # w # u’ $ w|w{0,1}kand u,u’ {0,1,#}*} (¬$)* $ (¬$)* ∩ ??? Rk = ??? There should be exactly one $ symbol, and … Each letter in W should appear after $ at exactly the same position … There should be some sequence of 0,1,#, followed by a # and then by a W … (0+1+#)* # ??? Note that size of Rk is (k2) and L(Rk) = Lk
Lower Bound for ERE Monitoring (III) Lk ={u # w # u’ $ w|w{0,1}kand u,u’ {0,1,#}*} • Let A be a monitor for Lk • When A reads symbol $, it should “remember” • exactly those w that have been seen so far • There are 22k possible distinct situations to remember; so at least 2k memory needed by A to encode each of these situations
Idea of an Event-Consuming Algorithm • “Consume” each event as it arrives, generating a new ERE monitoring requirement • Use the notion of derivative • R{a}is the ERE that should hold after seeing event a, in order for R to hold now • Algorithm A stores an ERE R, and • when an event a arrives it replaces R byR{a} • at the end of trace A checks whether εR • How can we generate R{a} efficiently? • How can we store R{a} compactly?
ERE Syntax • Sorts Ereand Event; subsort Event < Ere • Operations Φ : -> Ere ε : -> Ere _+_ : Ere Ere -> Ere[assoc comm id: empty] _ _ : Ere Ere -> Ere[assoc id: nil] _* : Ere -> Ere ¬_ : Ere -> Ere
Derivatives • Related work: • Antimirov and Mosses • Operations _{_} : Ere Event -> Ere _?_:_ : Bool Ere Ere -> Ere ε_ : Ere -> Bool • Equations (R1 + R2){a} = R1{a} + R2{a} (R1 R2){a} = R1{a} R2 + (εR) ? R2{a} : Φ (R*){a} = R{a} R* (¬R){a} = ¬(R{a}) ε{a} = Φ Φ{a} = Φ b{a} = (b == a) ? ε : Φ Obvious!
Three Important Simplifying Rules • Without any other rules, R{a1}{a2}…{an} can grow to unbounded size • Simplifying rules Φ R = Φ R + R = R R1 R + R2 R = (R1 + R2) R • Let R be the rewriting system defined so far
Theorems - 1 • R terminates modulo AC of _+_ and A of _ _ • φ(R{a}) = (φ(R) + 1)2(linear ordering didn’t work) • Problem for the termination competition? • Tested using CiME (thanks to Xavier Urbain) • R is ground Church-Rosser modulo AC of _+_ and A of _ _ • Hard to show • Non-linear TRS (R1 R + R2 R = (R1 + R2) R)
Theorems - 2 • L(R{a}) = {w | aw L(R)}for all EREs R • a1a2…an L(R) iffεR{a1}{a2}…{an} • R{a1}{a2}…{an} requires O(22m2) space and O(n22m2) time, where m = |R| • Hard proof • Current proof in proceedings has a (little) error • Can be fixed
Experiments and Conjectures • Implemented algorithm above in Maude • Generate all EREs Rof size m and all possible evolutions R{a1}{a2}…{an} • Encouraging results • For |R|=12, we got |R{a1}…{an}|≤108 • Conjectures: • The ERE monitoring rewriting algorithm runs in spaceO(2m)and in timeO(n 2m) • These are also the lower bounds for ERE membership
Conclusion and Future Work • Exponential complexity unavoidable when negation is added to regular expressions (EREs) • Few rewriting rules provide the best trace membership algorithm known for EREs! • We have also generated minimal DFAs using the presented algorithm plus circular coinduction • Algorithm shown to work in space O(22m2) but conjectured to run in O(2m) space • Claim based on experimental results • Proving conjecture can have a big impact!